Question / confusion about YOLO

aiai1 · October 26, 2025, 1:42am

I have seen Andrew’s lectures many times over and over and still struggling to understand all the details about YOLO mostly regarding what happens when an object spans more than one grid cell. The only answer I find repeatedly given this question is the canned lines “only one grid cell is responsible for predicting the object” or “object is assigned to only one grid cell” without giving any detail as to how this is actually accomplished. Bit question is… when number of grid cells is more than 3x3, say 19x19, it is quite clear that more than one grid cell will predict the objects with high probability. But during training of YOLO we are forcing it to stop predicting the object in cells to it was not “assigned”. In other words, aren’t we training it to ignore major parts of object that fall in non-assigned grid cells, and making it just use the part that falls in “assigned” grid cell? Wouldn’t that confuse the network and ultimately degrade its prediction ability???

lambliu · October 26, 2025, 2:10am

The training method of assigning an object to a single, center-based grid cell is not about forcing the network to ignore pixels. Instead, it is a highly effective training regularization that forces the network to:

Unambiguously identify the location of the object’s center.
Use its broad receptive field to aggregate information about the entire object (even the parts in neighbor cells).
Produce only one high-confidence prediction per object, greatly simplifying the output and improving speed.

While non-assigned cells are trained to be silent (Objectness=0), their visual input still contributes indirectly to the final accurate prediction made by the responsible cell.

ai_curious · October 26, 2025, 2:13am

Also see

And

Or even

aiai1 · October 26, 2025, 10:38pm

So you are saying that :

Objectness is zero for all neighboring gridcells when centroid does not lie in them EVEN when they contain large parts of the object?
Objectness is 1 for gridcell in which centroid lies and it considers/uses large parts of the object from neighboring gridcells due to convolutional application of sliding window as discussed in this slide?
image867×349 84.4 KB

TMosh · October 27, 2025, 12:40am

Yes, and Yes.

ai_curious · October 27, 2025, 9:58am

@aiai1 I agree with these assertions from @TMosh . However there are nuances.

First, the assignment of 1 and 0 as the objectness value happens at labeled training data creation time; population of Y. The output of forward propagation, \hat{Y} , contains predicted values, not assigned values.

Second, YOLO works because the input to forward propagation is the entire input image, X, not a sub-region, which is the case in sliding windows. The grid cells are not decompositions of that input. They influence the shape of the network output, not its input. Labeled ground truth bounding box shapes in Y are not constrained to lie within a single grid cell, and neither are the predicted bounding boxes in the network output \hat{Y}

Finally, despite best efforts, training and runtime prediction are imperfect. It is possible that more than one grid cell of \hat{Y} will end up with a non-zero objectness predicted value (and bounding box shape and location and class) for the same object in the input image. During training the cost function attempts to correct this. At runtime, we use non-max suppression to disambiguate.

ai_curious · October 27, 2025, 4:14pm

Yet another example of how, actually, this is accomplished. YOLO is a step up in complexity from what has been covered previously in the lectures, hopefully these historical threads help clear things up a little.

Topic		Replies	Views
Week 3: finding the correct cell in YOLO Convolutional Neural Networks coursera-platform	7	755	October 26, 2025
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	901	July 10, 2021
YOLO Algorithm and grid cells Convolutional Neural Networks week-module-3 , coursera-platform	11	166	March 19, 2025
How does YOLO know if 3 cells make 1 object? Convolutional Neural Networks coursera-platform	3	642	August 14, 2023
Question on YOLO and sliding window detection Convolutional Neural Networks week-module-3	4	49	July 29, 2025

Question / confusion about YOLO

Related topics