I did not fully understand the concept of the anchor boxes, what are they used for?

And why does the output shape of the YOLO algorithm need to depend on the number of anchor boxes? For example, if we have 2 anchor boxes, why does the output shape need to be, for instance,

`19*19*2*8`

? Why not `19*19*nb_of_classes*8`

meaning that the output shape depends on the number of classes?

At first, I thought that the number of anchor boxes was equal to the number of classes to detect, but I found out that this was not the case. So why is that? How do we choose the number of anchor boxes? And how do we choose their shapes?

I also did not understand how the anchor boxes fit into the YOLO algorithm.

Thank you for the clarification!

There are lots of good threads that cover YOLO on the forums. Here’s one that focusses on Anchor Boxes. Here’s another one which discusses how Anchor Boxes are derived.

Actually, neither of these is correct for YOLO output shape. Suggest another look at the papers or the notebook description of the YOLO output shape and what parameters determine it.

Spoiler alert: for YOLO v2, which is what the exercise in this class is based on, the output shape is S*S*B*(1+4+C). Similar but slightly different for the original v1.

Let us know what you find .