[C4W3] YOLO grid question

Maybe take a look at this previous post, and see if it addresses your question?

That may be a lot to digest if it’s your first time really digging in to this algorithm, but it shows the equations YOLO is using to relate predictions to anchor box and grid cell sizes. No other way to really understand it, in my opinion.

The tldr shortcut is that every network output location (m, S, S, B) makes a vector of (1 + 4 + C) predictions based on input from the entire image, thus these predictions are based on input that is not constrained to the specific grid cell or anchor box shape they represent. It is one of the key differentiating aspects of YOLO from sliding windows and other region-based approaches.

2 Likes