Hello Mentor Team,
Good day!! I have been trying hard to visualize the concept of Object classification with Localisation concept used in Yolo algorithm, and how are the bounding boxes identified for cross cutting objects across grid cells.
My understanding from the Lecture(Video-Bounding Box Predictions) is that each slice of the Image (determined by the 3X3 or 19X19 grids) will go through the convolution net to figure out whether an object exist, which one it is and where does it exist.
In Yolo, while there is optimisation of doing all windows in one go through shared computing, my question is, if the object is cutting across 2 grids ( or 4 grid cells in the worst case), how does the bounding get identified?
Each slice of the image that’s convolved is only a part of the car, how does the object across grids get combined and a mid point identified? Will be great if someone can throw some light on this. I hope I am able to frame my doubt clearly.
Thanks in Advance,
Prakash Janjanam