def bounding_box_regression(inputs):
bounding_box_regression_output = tf.keras.layers.Dense(units = ‘4’, name = ‘bounding_box’)(inputs)
return bounding_box_regression_output
why used 4 units here ?
def bounding_box_regression(inputs):
bounding_box_regression_output = tf.keras.layers.Dense(units = ‘4’, name = ‘bounding_box’)(inputs)
return bounding_box_regression_output
why used 4 units here ?
Those 4 units are the coordinates of the bounding box, thats why!
@gent.spah has it exactly right. The shape of the output layer of your network must match the number of predicted values you want, your \hat{y}
The loss function compares those outputs to your training data, the y values, which must be the same shape.
So if you’re learning to classify an image as cat/non-cat you need 1 output. If you’re learning bounding box coordinates you need 4. If you’re learning object Classification and object Localization at the same time, you need 5. Does this make sense?
PS: other than memory and compute time there is no limit to the number of output vales a neural network can learn at the same time. The famous YOLO object detection algorithm outputs over 150,000 predictions each forward pass. At 40+ frames per second
Thanks for the elaboration @ai_curious, im also learning from your answers