Week3: Raw Output from YOLO to get final predictions

Hey @ai_curious

One topic that the lectures do not really touch upon is transforming raw output from YOLO to get final predictions. Or, in our programming assignment, basically whatever goes on inside the yolo_head function present in yad2k.models.keras_yolo file

The bounding box xy coordinates go through sigmoid activation + some offsetting and it’s height and width are scaled exponentially and by anchor height and width. Any insights / intuition from anyone on what exactly is going on here?

Was trying to read blogs online and here’s what one of them mentions…

Would be really helpful if someone can give more intuition around anchor boxes, I have a feeling that there’s a lot more that we need to grasp here…

Another thing, just looked at the formula for the localization loss in YOLO. If it’s computed as the sum of the squared error difference between the ground truth boxes and predicted boundary boxes, then what role exactly does anchor boxes have to play. All I can grasp is that they are there to just hold multiple classes… getting very confused now with these anchor boxes!

I recommend you search the forum for the term “anchor box”. One of the community members has created many posts about exactly this topic.


Even I didn’t know it was that many. I need to get a life

@ai_curious, your service here is greatly admired.

1 Like

Anchor boxes don’t play an explicit role in the localization loss. They influence the shapes of the predicted bounding boxes, and they determine which locations in the ground truth matrix have non-zero training data values. But their shapes are not explicitly part of the loss computation.

Also a reminder that loss in YOLO is not just localization loss…it includes classification and object presence/absence. And localization loss has two components- object center coordinates, and shapes.

Here’s one of the YOLO posts from ai_curious about Anchor Boxes that I have bookmarked. That one describes how the Anchor Boxes are derived. Then it links to this one, which describes how they are actually used by the algorithm.

These two posts are super, thanks for pointing to them and of course thank you sir @ai_curious for writing them :slightly_smiling_face:

1 Like