Target Label Y with grid cells

ai_curious · March 19, 2025, 2:04pm

I think this is a correct summary. It’s an engineering tradeoff. More grid cells means capability to detect more objects per image. But also since the entire prediction vector is computed for each grid cell + anchor box, computation and memory scale with ^2 of the grid size meaning there is a business-driven practical upper bound.

The ground truth labels are initially associated with the training input, X. Often they are provided in a text file and may be XML or JSON. For YOLO training, these labels must be mapped in a preprocessing step to a matrix sharing the network output shape - what you refer to as target label Y above. During training the ground truth labels are iteratively compared with the network generated predicted output, \hat{Y}.

There is some related discussion here: Week 3: finding the correct cell in YOLO

Topic		Replies	Views
YOLO Algorithm and grid cells Convolutional Neural Networks week-3	11	83	March 19, 2025
Week 3: finding the correct cell in YOLO Convolutional Neural Networks	3	676	January 6, 2023
https://www.coursera.org/learn/convolutional-neural-networks/lecture/fF3O0/yolo-algorithm Convolutional Neural Networks	5	694	March 12, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	823	July 10, 2021
Week3 How to prepare the ground truth vector y Convolutional Neural Networks	3	574	June 9, 2022

Target Label Y with grid cells

Related topics