YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids)

Hello Mentor Team,

Good day!! I have been trying hard to visualize the concept of Object classification with Localisation concept used in Yolo algorithm, and how are the bounding boxes identified for cross cutting objects across grid cells.

My understanding from the Lecture(Video-Bounding Box Predictions) is that each slice of the Image (determined by the 3X3 or 19X19 grids) will go through the convolution net to figure out whether an object exist, which one it is and where does it exist.

In Yolo, while there is optimisation of doing all windows in one go through shared computing, my question is, if the object is cutting across 2 grids ( or 4 grid cells in the worst case), how does the bounding get identified?

Each slice of the image that’s convolved is only a part of the car, how does the object across grids get combined and a mid point identified? Will be great if someone can throw some light on this. I hope I am able to frame my doubt clearly.

Thanks in Advance,
Prakash Janjanam

Several threads in the forum cover this. Maybe take a look and tell us what you find?

[Week 3 Yolo Doubt About Sliding Window - #3 by ai_curious]

[Quick question regarding YOLO algorithm]

[[C4W3] YOLO grid question]

[Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes]

The tldr is that grid cells in YOLO are not sliding windows and unlike sliding windows, YOLO does not actually divide up the input image into subregions. The grid cells represent sets of predictions, each of which is made concurrently and each of which uses the entire input image.

1 Like

@ai_curious , Thank you for pointing me to some possible resources, I shall go through and understand better.