Questions on bounding box predictions YOLO

I am going through the Bounding box predictions section of YOLO on week 3, and I have a confusion regarding the case where the Ground truth (GT) bounding box is larger than grid cell (like 19x19 grid case).

The lecture mentions for each object in the training image, we pick a single cell which contains the mid point of the GT bounding box for the object, and assign it a label containing b_x, b_y, b_h, b_w ( scaled in the relative frame of the grid thats in (0,0) → (1,1)), but if the GT bounding box is larger than the grid cell and completely overlaps the grid cell, how are the b_w and b_h values defined? Would they be > 1.0 to indicate the height / width of GT box (still in relative frame of the grid cell) OR would they be clamped at 1.0? If its the latter I would be very confused as to how the network learns anything useful.

Could someone please help clarify?

Hello, @gulshanm,

For our lectures, yes, the width and height will be larger than 1 if they exceed the size of a grid. Refer to the following video for the discussion:

Note that the 2015 YOLO paper calculates the width and height relative to the size of the whole image so they are always between 0 and 1, but since the image size and the grid size are fixed, the approaches of the paper and the course are only differed by some constant multiplication factors.

Cheers,
Raymond

Thanks for the clarification! Yes the same lecture video confused me a bit, because 0.9 is visually also the ratio of intersection of the GT bounding box w.r.t the grid cell in the width dimension. But I guess the key point is we are not interested in intersection area width and height between the GT bounding box and grid cell, we want to use the width and height of the actual GT bounding box just scaled with respect to the grid cell dimensions.

Please let me know if my understanding is correct :folded_hands:

Hello @gulshanm, yes, you are right that b_w is just \frac{\text{width of bounding box}}{\text{width of grid cell}} and it is nothing about “intersection”. Even if we move the center of the bounding box to nearly a corner of (and still inside) the grid cell, the value of b_w does not change.

Cheers,
Raymond

You might also want to review this thread, which goes into some detail on the relationships between predicted bounding boxes dimensions, grid cells, and anchor box shapes.

Also, keep in mind that sometimes Prof Ng presents at a somewhat conceptual level, which may or may not translate directly to how the algorithms are implemented in the related programming exercises. In the case of YOLO, there are some nuances he understandably glosses over or simplifies for the lecture/video.