Questions on bounding box predictions YOLO

gulshanm · August 16, 2025, 9:53am

I am going through the Bounding box predictions section of YOLO on week 3, and I have a confusion regarding the case where the Ground truth (GT) bounding box is larger than grid cell (like 19x19 grid case).

The lecture mentions for each object in the training image, we pick a single cell which contains the mid point of the GT bounding box for the object, and assign it a label containing b_x, b_y, b_h, b_w ( scaled in the relative frame of the grid thats in (0,0) → (1,1)), but if the GT bounding box is larger than the grid cell and completely overlaps the grid cell, how are the b_w and b_h values defined? Would they be > 1.0 to indicate the height / width of GT box (still in relative frame of the grid cell) OR would they be clamped at 1.0? If its the latter I would be very confused as to how the network learns anything useful.

Could someone please help clarify?

rmwkwok · August 16, 2025, 12:50pm

Hello, @gulshanm,

For our lectures, yes, the width and height will be larger than 1 if they exceed the size of a grid. Refer to the following video for the discussion:

Note that the 2015 YOLO paper calculates the width and height relative to the size of the whole image so they are always between 0 and 1, but since the image size and the grid size are fixed, the approaches of the paper and the course are only differed by some constant multiplication factors.

Cheers,
Raymond

gulshanm · August 18, 2025, 4:24pm

Thanks for the clarification! Yes the same lecture video confused me a bit, because 0.9 is visually also the ratio of intersection of the GT bounding box w.r.t the grid cell in the width dimension. But I guess the key point is we are not interested in intersection area width and height between the GT bounding box and grid cell, we want to use the width and height of the actual GT bounding box just scaled with respect to the grid cell dimensions.

Please let me know if my understanding is correct

rmwkwok · August 19, 2025, 12:09am

Hello @gulshanm, yes, you are right that b_w is just \frac{\text{width of bounding box}}{\text{width of grid cell}} and it is nothing about “intersection”. Even if we move the center of the bounding box to nearly a corner of (and still inside) the grid cell, the value of b_w does not change.

Cheers,
Raymond

ai_curious · August 19, 2025, 1:07pm

You might also want to review this thread, which goes into some detail on the relationships between predicted bounding boxes dimensions, grid cells, and anchor box shapes.

Also, keep in mind that sometimes Prof Ng presents at a somewhat conceptual level, which may or may not translate directly to how the algorithms are implemented in the related programming exercises. In the case of YOLO, there are some nuances he understandably glosses over or simplifies for the lecture/video.

Topic		Replies	Views
Understanding the wide and height label in a bounding box Convolutional Neural Networks coursera-platform	1	671	September 17, 2022
I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids? Convolutional Neural Networks week-module-3 , ai-discussions , coursera-platform	2	40	March 19, 2025
YOLO Algorithm and grid cells Convolutional Neural Networks week-module-3 , coursera-platform	11	163	March 19, 2025
YOLO concept confusion Convolutional Neural Networks coursera-platform	1	648	November 3, 2021
DLS - Course 4 - W3 - bounding box coordinates Convolutional Neural Networks coursera-platform	9	797	April 19, 2023

Questions on bounding box predictions YOLO

Related topics