How does a cell detect a bounding box bigger than itself, YOLO?

ai_curious · July 10, 2021, 9:47pm

This is correct, but it is important to see how that happens, which is described in one of the replies above in this thread. At training time you know the ground truth bounding box location, the training image dimensions, and the number of grid cells. From this, you can calculate which pixel is the ground truth bounding box center, and map that to one specific grid cell. That grid cell is the given a 1 for object presence, and the other grid cells 0. Grid cell center location / object presence is included as one component of the cost function, so the network ‘learns’ how to mimic the manual assignment. At runtime, it just makes predictions based on the input signal and the learned parameters of the neural net.

For objects in the middle of, and wholly contained by, the area corresponding to one grid cell, the center is predicted well. For objects that straddle grid cell regions and/ or are bigger than a single grid, you may have multiple predictions for the same object. Non-max-suppression comes in to play to disambiguate then.

Topic		Replies	Views
YOLO Algorithm and grid cells Convolutional Neural Networks week-module-3 , coursera-platform	11	153	March 19, 2025
YOLO concept confusion Convolutional Neural Networks coursera-platform	1	648	November 3, 2021
Questions about YOLO Convolutional Neural Networks coursera-platform	13	2561	January 23, 2025
Week 3: finding the correct cell in YOLO Convolutional Neural Networks coursera-platform	7	746	October 26, 2025
YOLO algorithm bounding boxes car detection Convolutional Neural Networks coursera-platform	1	619	January 23, 2022

How does a cell detect a bounding box bigger than itself, YOLO?

Related topics