Week 3 video: non max suppression

zheng_xiang1 · April 2, 2023, 7:05am

hi guys just wanna know how does the bounding box in each grid from being “localized” at each point of the small boxes like this.

join together to make a bigger box like this

tldr:for example all the boxes at this point have “found” an object but do u go from many small boxes to 1 big box

ai_curious · April 2, 2023, 10:03am

First you need to clarify what object detection algorithm you are asking about. In YOLO, which is normally what is being discussed in this class when we see 19x19 grids, there is no ‘joining’ because each prediction is of a full object. Predicted bounding boxes are not constrained to be no larger than a grid cell. The mechanism for that is covered in (several) existing threads.

zheng_xiang1 · April 2, 2023, 12:21pm

do u mind pointing me to the threads that u have found to be most useful

thanks

ai_curious · April 2, 2023, 1:08pm

Here’s one…

Find the section discussing predicted bounding box shape

zheng_xiang1 · April 2, 2023, 1:35pm

solid copy tks mate for the sharing

ai_curious · April 2, 2023, 2:20pm

Glad that helped. An important takeaway from the equations for predicted bounding box center location and shape is that the YOLO CNN does not output them directly. Rather, the net outputs values that are cleverly set up to be on the same scale (ie 10^0 ) as the object presence and class confidence slash probability. This allows them all to play nice in the loss function as well as be treated as a single overall regression,rather than having separate pipelines and models for the classification and regression elements. The equations for shape also show the importance of choosing good anchor boxes, or priors, since they are multiplicative factors in the shape outputs. Mathematically I guess the shape could be between 0 and positive infinity number of pixels. Practically, the lower bound is at least 1, since a bounding box can’t have less than 1 pixel height and width, and probably really 3 or more, since it’s unlikely you’d get features out of objects any smaller. The upper bound is the size of the input image itself. When establishing the training data, you reverse engineer from the actual shapes to the t_i the network would need to generate to produce them, then the training and loss function take over.

Topic		Replies	Views
I don't get it why, when having a 3x3 grid, it can still detect an object that overlaps 2 grids? Convolutional Neural Networks week-3 , ai-discussions	2	22	March 19, 2025
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	825	July 10, 2021
Object detection using yolo Convolutional Neural Networks	7	615	March 13, 2023
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	633	December 2, 2021
YOLO Algorithm and grid cells Convolutional Neural Networks week-3	11	87	March 19, 2025

Week 3 video: non max suppression

Related topics