What are anchor boxes doing? week 3, assignment 1

I think what you are missing is that this is all about multiple boxes being returned. There is no constraint that says there is only one per grid cell. The only constraint is that we use the threshold value to screen out hits that look weaker thus ending up with the highest confidence predictions. But it’s still the case that there can be more than one per grid cell and even more than one per anchor box within a given grid cell.

YOLO is by far the most complex network that we’ve seen so far. There is a lot to consider here in order to really “grok” what is going on. I make no claim to really understand it yet. One good thing to do if you want to go deeper is to look at some of the explanatory posts about YOLO from @Ai_curious here on Discourse. Here’s a good one to start with. And here’s another to follow up from that.