def bounding_box_regression(inputs):

bounding_box_regression_output = tf.keras.layers.Dense(units = ‘4’, name = ‘bounding_box’)(inputs)

return bounding_box_regression_output

why used 4 units here ?

def bounding_box_regression(inputs):

bounding_box_regression_output = tf.keras.layers.Dense(units = ‘4’, name = ‘bounding_box’)(inputs)

return bounding_box_regression_output

why used 4 units here ?

Those 4 units are the coordinates of the bounding box, thats why!

1 Like

@gent.spah has it exactly right. The shape of the output layer of your network must match the number of predicted values you want, your \hat{y}

The loss function compares those outputs to your training data, the y values, which must be the same shape.

So if you’re learning to classify an image as cat/non-cat you need 1 output. If you’re learning bounding box coordinates you need 4. If you’re learning object Classification *and* object Localization at the same time, you need 5. Does this make sense?

PS: other than memory and compute time there is no limit to the number of output vales a neural network can learn at the same time. The famous YOLO object detection algorithm outputs over 150,000 predictions each forward pass. At 40+ frames per second

1 Like