IOU for yolo algorithm

When doing the YOLO algorithm with multiple anchor boxes, when an object is detected it finds which anchor box has the highest IOU with the predicted bounding box. my question is how do you place the anchor box with respect to the bounding box while calculating the IOU?
when we are finding the IOU for true and predicted boxes both the boxes are well defined thereby making the calculation very straightforward.
and also how do we decide the size of the anchor box?

and also would the number of anchor boxes be equal to the number of classes?

That isn’t completely correct. The only time what you describe occurs is when creating training data. Then, IOU is used to assign one of the anchor boxes as ‘responsible’ for the object detection. To be precise, anchor box shape is compared to labelled object ground truth, which is a bounding box. Anchor box shape also serves as the baseline for bounding box height and width predictions, but IOU does not play a role there.

At runtime, IOU is not involved in making predictions. It is used in pruning the list of candidate objects, but this is IOU of two object bounding boxes, neither of which is an anchor box.

Anchor box shapes in YOLO are selected through unsupervised learning on the training data set, and represent common shapes. The number to use is an engineering tradeoff between prediction accuracy and runtime computation cost. They have a height and width only; no location and no object class or type.

1 Like

and also would the number of anchor boxes be equal to the number of classes?

The short answer is ‘no’

The number of anchor boxes is part of the calculus driving the output shape of the network. It is a multiplicative factor. The exact number of anchor boxes to employ is business problem driven, but is commonly less than 10, whereas the number of classes can easily be an order of magnitude larger, in the 100’s. There is diminishing marginal value to increased anchor boxes, so after 10-ish (or less) you incur cost of computation time and memory for all the additional prediction outputs that are not justified by meager improvements in accuracy.

The number of anchor boxes for YOLO is selected through running K-means on the ground truth bounding boxes in the training data. This graph depicts accuracy versus the number of K-means centroids (number of anchor boxes) for one particular data set. You can clearly see that by 9 there isn’t much gain. Definitely no need for 90, or 900.

Veggie Universe Anchor Box Kmeans from 1

1 Like