Non-max suppression - cell smaller than object?

Zijun_Liu · August 6, 2024, 7:41pm

Hi everyone,

I’m watching this video, Non-max Suppression, and have a question regarding an example presented in a video. In this example, the detection area is divided into a 19x19 grid. We are trying to detect objects in each cell, but I noticed that each cell (highlighted in green and yellow) is significantly smaller than the car being detected. Given that the training examples might be entire cars, how does the algorithm accurately detect parts of a car (like a door, a window, or a wheel) within these small cells? Specifically, how does it recognize that these individual components should be labeled as a car?

I’m trying to understand the connection between the small-scale detection in each grid cell and the labeling process based on the training on whole objects. Any insights or explanations would be greatly appreciated!

Thank you!

paulinpaloalto · August 6, 2024, 11:26pm

The detection of objects happens independent of the grid cells and there is no requirement that an object be contained in a grid cell. The grid cells are just used to organize the output, because a given detected object is attached to the grid cell that contains its centroid. The training of the network is for detecting whole objects and that is driven (as in all “supervised learning” cases) by how the input training data is labeled.

There are a number of threads on the forums that go into quite a bit more depth on how YOLO works and is trained than we get in the lectures or the assignments. Here’s a good one to start with that discusses how the training works.

ai_curious · August 7, 2024, 11:07am

This question has been asked and answered a few times in the past. Here is one, and in it is a link to another.

How does YOLO know if 3 cells make 1 object?

Zijun_Liu · August 8, 2024, 12:46pm

Hi Paul, I’ve read (i think) all the related posts but I’m still a bit confused. In the videos, Prof Ng said:

Doesn’t it mean that the prediction happens within each grid cell?

ai_curious · August 8, 2024, 2:01pm

Unfortunately, the language used by Prof Ng here is not a precise description of what the YOLO algorithm does. At runtime, YOLO inputs the entire image once, runs forward propagation once, and outputs one matrix comprised of all the predictions for all of the grid cells. There is an explicit reference to this made by Prof Ng at 3:43 of the YOLO algorithm video. YOLO does not run a CNN forward propagation per grid cell as might be reasonably inferred from the transcript excerpts in this thread.

TMosh · August 8, 2024, 3:18pm

Andrew tends to lecture using broad intuitions that easily convey the concepts. He often omits (or simplifies) a lot of specific details - since he has no idea how much specific experience the audience has.

ai_curious · August 8, 2024, 3:31pm

I agree. The language used in these videos conceptually kind of straddles the boundary between convolutional sliding windows, introduced previously, and YOLO, introduced subsequently, without explicit reference to either.

Another challenge is that these lectures are a snapshot in time. They might describe a version of an algorithm that was current when the video was recorded, but not reflect current practice or even the latest state of the related programming exercises. I think these videos are from 2017 ish

Zijun_Liu · August 8, 2024, 3:45pm

Thank you for the clarification! So for the Sliding Windows Detection algorithm, the prediction happens per grid cell, correct?

ai_curious · August 8, 2024, 4:54pm

Per window, yes

Topic		Replies	Views
Object Detection (jump from small cell to these big red boxes ) Convolutional Neural Networks	3	521	August 22, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	824	July 10, 2021
YOLO algorithm bounding boxes car detection Convolutional Neural Networks	1	609	January 23, 2022
YOLO Algorithm and grid cells Convolutional Neural Networks week-3	11	87	March 19, 2025
How does YOLO know if 3 cells make 1 object? Convolutional Neural Networks	3	611	August 14, 2023

Non-max suppression - cell smaller than object?

Related topics