This is one function they fulfill. The other is that they act as initializers for bounding box shape predictions. Not quite literally, but experiments showed that introducing anchor boxes shaped by analysis on the training data set improved the stability of the model during localization training. This happens because YOLO doesn’t directly predict bounding box shapes. Instead, it predicts numbers that are applied as a scaling factor to the anchor boxes. It is detailed in one of my olde threads, I will look for it and provide a link.
Always risky to come right out and say X is how YOLO works, because it evolved over time. The lectures in the course sometimes gloss over which version they are talking about. The programming exercise on autonomous cars is based on V2, and for that I can say that predicted bounding boxes are never compared to anchor boxes.
When setting up the training data matrix the ground truth bounding boxes are compared to anchor box shapes using IOU in order to assign to a specific location in the ground truth matrix Y.
During training, the network outputs a bounding box prediction for each grid cell + anchor box location in \hat{Y}. Each predicted bounding box is then compared to the ground truth bounding box in the corresponding location of Y. The shape of the anchor box plays no role in this comparison (which is performed inside the loss function).
During runtime, after forward propagation, predicted bounding boxes are compared to each other, also using IOU. Predicted bounding boxes with a sufficiently high IOU are deemed duplicate and only the one predicted bounding box with the highest confidence is retained.
Summary
- Pre-training - ground truth bounding box compared to anchor box
- Training - ground truth bounding box compared to predicted bounding box
- Runtime - predicted bounding box compared to other predicted bounding boxes after forward prop completes
- Never - Anchor box compared to predicted bounding box
There are many previous threads related to anchor boxes.
This one has some discussion that overlaps significantly with this thread but might be a useful read / comparison. It includes the details of how anchor boxes (called priors in the YOLO v1 paper, hence the p in the equations) are used to compute predicted bounding box shape from the direct network outputs ( called t_w and t_h ).
Hope this helps