Course4 Week3: Understanding YOLO Algorithm

arunnalpet · November 28, 2021, 10:38am

Hi,
I have several questions about how YOLO algorithm works. Here are my questions:

In the course Andrew mentions that Image classification is applied to each of the individual “grid cells”. In case where an object could span across multiple grid cells, how could the algorithm achieve to still identify the center point of the object, and span the bounding boxes beyond a particular grid cell?? What actually happens when object is spread across multiple grid cells?
In the forward pass, there are chances that algorithm might detect multiple bounding boxes for a single object. Why does this happen? The forward pass happens only once per image, so how are we ending up with multiple predictions of bounding boxes per image?? I know it finally gets suppressed, but how does it end up with so many bounding boxes in the intermediate stages?
For YOLO training, can we feed images with multiple objects in a single image? or should it be a cropped image with single class? How does the training image annotated? will it be 3x3x2x8? or would it be a 1x8 vector representing a specific class?

ai_curious · November 28, 2021, 11:54am

Maybe take a look at this existing thread and let us know if it answers any of your questions?

arunnalpet · November 29, 2021, 4:53am

Thanks for pointing out to detailed description!
Few things are clear, but still not able to connect the dots all together!

Like the earlier sliding window concept was very clear! But after that when we talk about YOLO, it is not clear on how each grid can detect an object which is larger than the grid. (However the representation of Grid and its corresponding outputs are clear).

Also, on my Question No.3, What would be the format of a single labelled training image?

ai_curious · November 29, 2021, 11:31am

[How to prepare bounding box labels - #5 by ai_curious]

ai_curious · November 29, 2021, 11:39am

Look specifically for the paragraph containing…

(this works even if this results in values larger than the grid cell dimension – it is exactly the mechanism that allows YOLO to predict bounding box shapes larger than one grid cell).

Topic		Replies	Views
Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes Convolutional Neural Networks	6	1550	March 16, 2024
How does YOLO know if 3 cells make 1 object? Convolutional Neural Networks	3	611	August 14, 2023
YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids) Convolutional Neural Networks	2	731	November 25, 2021
YOLO Algorithm and grid cells Convolutional Neural Networks week-3	11	86	March 19, 2025
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	633	December 2, 2021

Course4 Week3: Understanding YOLO Algorithm

Related topics