I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids?

ai_curious · March 19, 2025, 2:18pm

+1 for calling out this really important point that isn’t always mentioned or emphasized. In YOLO v2, the predicted bounding box shape is the anchor box shape times a factor. The derived bounding box shape is used in the cost function, but the factor is what the network is learning to generate. This also reinforces why it is important to have a good set of anchor boxes; the closer the anchor box shapes are to the ground truth bounding box shapes, the faster and better the network learns the weights that produce better factors and lower localization error.

@Zolids there are many threads that go deep on the questions you pose above. Here is one that might be useful:

It contains the mathematical expressions for what @Alireza_Saei wrote and that I quoted above. From those equations you can see how a predicted bounding box shape can exceed the dimension of a grid cell.

Topic		Replies	Views
[C4W3] YOLO grid question Convolutional Neural Networks coursera-platform	1	669	August 26, 2021
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks coursera-platform	2	635	December 2, 2021
Week 3 video: non max suppression Convolutional Neural Networks coursera-platform	5	620	April 2, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	829	July 10, 2021
Yolo Library Bounding Box Convolutional Neural Networks coursera-platform	3	730	September 18, 2021

I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids?

Related topics