A clarification about Image Classification and Localization Algorithm and YOLO

paulinpaloalto · August 28, 2022, 6:56pm

YOLO is by far the most complicated system we’ve seen so far, so it’s no wonder that it takes some serious headscratching to understand. The point is not that the algorithm can’t see things outside of the current grid cell: the grid cells are just used to organize the computation. A given object will be reported only for the grid cell that contains its centroid, but there is no requirement that the bounding box of the object lies completely within the grid cell. The bounding box “is what it is”. Over the next couple of lectures and in the assignment, you’ll also see how they deal with the fact that the same object can be reported multiple times with slightly different bounding boxes. In all this, Prof Ng doesn’t really say much about how all this complexity gets trained, but it’s a safe bet that “it’s complicated”.

If you have more detailed questions about any of this and want to go deeper, there are some great threads from fellow student ai_curious who has done some serious work using and studying YOLO and then writing about it. Here’s a good one to start on and this one is more specific to the question of multiple bounding boxes.

Topic		Replies	Views
YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids) Convolutional Neural Networks coursera-platform	2	743	November 25, 2021
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	859	July 10, 2021
YOLO algorithm bounding boxes car detection Convolutional Neural Networks coursera-platform	1	617	January 23, 2022
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks coursera-platform	5	822	March 18, 2025
YOLO concept confusion Convolutional Neural Networks coursera-platform	1	647	November 3, 2021

A clarification about Image Classification and Localization Algorithm and YOLO

Related topics