I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids?

Zolids · March 12, 2025, 9:16am

I dont’ get it why the pedestrian there have bounding boxes, larger than the grid? I dont understand how each grid communicates with each other. Like isn’t the bh bw maximum is only 1 per grid?

And also please explain to me how the bx, by (middle) points is used in the 3x3 grid, whats the relevance?

Alireza_Saei · March 12, 2025, 12:18pm

Hi @Zolids

The Yolo network predicts adjustments to predefined anchor boxes of varying dimensions, and convolutional layers learn features across the entire image, that makes them to capture objects spanning multiple cells; while bx and by values, normalized between 0 and 1, define the bounding box center relative to each grid cell’s top-left corner, that gives precise localization within the coarse grid cell divisions.

Hope it helps! Feel free to ask if you need further assistance.

ai_curious · March 19, 2025, 2:18pm

+1 for calling out this really important point that isn’t always mentioned or emphasized. In YOLO v2, the predicted bounding box shape is the anchor box shape times a factor. The derived bounding box shape is used in the cost function, but the factor is what the network is learning to generate. This also reinforces why it is important to have a good set of anchor boxes; the closer the anchor box shapes are to the ground truth bounding box shapes, the faster and better the network learns the weights that produce better factors and lower localization error.

@Zolids there are many threads that go deep on the questions you pose above. Here is one that might be useful:

It contains the mathematical expressions for what @Alireza_Saei wrote and that I quoted above. From those equations you can see how a predicted bounding box shape can exceed the dimension of a grid cell.

Topic		Replies	Views
YOLO Algorithm and grid cells Convolutional Neural Networks week-module-3 , coursera-platform	11	98	March 19, 2025
Week 3 video: non max suppression Convolutional Neural Networks coursera-platform	5	620	April 2, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	841	July 10, 2021
Yolo Library Bounding Box Convolutional Neural Networks coursera-platform	3	730	September 18, 2021
[C4W3] YOLO grid question Convolutional Neural Networks coursera-platform	1	669	August 26, 2021

I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids?

Related topics