How does YOLO know if 3 cells make 1 object?

ai_curious · August 12, 2023, 9:57pm

The grid cells and anchor boxes in YOLO don’t cooperate at all. Each grid cell + anchor box location, called a ‘detector’ in the original paper, makes a set of class and location predictions based on its training. Each prediction occurs in parallel, completely independently of one another.

In post-CNN processing, possible duplicate predictions made by multiple ‘detectors’ are disambiguated and filtered so only the highest confidence prediction is retained.

There is no information sharing across ‘detectors’ and no merging. If an object is spread across multiple grid cells, the center of the object is only in one of them, and that grid cell is the one that should be making the prediction for the entire object - not for only the part of the object within its grid cell. If each of the grid cells makes a prediction (again for the entire object) then they will be ranked by confidence with the lesser quality predictions suppressed.

@DHAiRYA there are already several old threads covering this topic. Try the search.

Here is one example of a related thread:

Topic		Replies	Views
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	631	December 2, 2021
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks	5	815	March 18, 2025
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	823	July 10, 2021
Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes Convolutional Neural Networks	6	1541	March 16, 2024
YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids) Convolutional Neural Networks	2	731	November 25, 2021

How does YOLO know if 3 cells make 1 object?

Related topics