Training data for YOLO

Aditya_Ranganath · January 13, 2024, 8:14pm

Say you have human labeled training data where images are split into 19x19 grid cells. Now, if you want to use this dataset for your YOLO algorithm but want 25x25 grid cells, how would you do it? It is impractical to label millions of images manually yourself using 25x25 grid. How can you use the available 19x19 grid training data for 25x25 grid based YOLO algorithm?

paulinpaloalto · January 13, 2024, 9:40pm

If memory serves, the only thing that the grid is used for in terms of YOLO data is deciding which grid cell contains the centroid of a given object. There is no requirement that any object be completely contained in a given grid cell. So I would think you could just write an algorithm to do the conversion here: you have the input with the centroids and associated grid cells and you know the cell dimensions. Now compute the new cells and figure out where each centroid belongs in the new grid. You also have to rescale the sizes of the bounding boxes as I think they are expressed in multiples of the grid cell size. It’s all just analytic geometry, right? No ML involved. Rene Descartes could have given you the solution in 1645.

Aditya_Ranganath · January 13, 2024, 11:51pm

Thanks @paulinpaloalto.

I was thinking that but it felt too simple to be true

ai_curious · January 15, 2024, 2:22am

The raw data used for training YOLO knows nothing about grids; it’s just object location and type information. As @paulinpaloalto suggests above, the raw image input data is mapped algebraically into a ground truth matrix with the same dimensions of the YOLO network output layer based on the number of grid cells and the number of pixels per grid cell (and # anchor boxes and classes). This is what enables the loss function to compute error, the difference between the ground truth, Y, and the predicted output, \hat{Y}. If you change the number of grid cells, you alter the shape of both Y and \hat{Y} and must redo the cell assignment of the objects.

ai_curious · January 15, 2024, 11:08am

By the way, if one is going to manually compute and assign training data locations and values as suggested above, it is important to remember that YOLO does not predict the bounding box location or shape values b_i directly. Rather it predicts a set of four other numbers t_i that are related to the values of b_i by equations provided in the YOLO papers and discussed a bit here: Applying YOLO anchor boxes

Cheers

Topic		Replies	Views
https://www.coursera.org/learn/convolutional-neural-networks/lecture/fF3O0/yolo-algorithm Convolutional Neural Networks	5	695	March 12, 2023
YOLO- Training dataset Convolutional Neural Networks week-2	3	41	January 17, 2025
YOLO Algorithm and grid cells Convolutional Neural Networks week-3	11	88	March 19, 2025
Week 3: finding the correct cell in YOLO Convolutional Neural Networks	3	677	January 6, 2023
Output grid cells for YOLO, Sliding Window Convolutional Neural Networks	3	537	February 19, 2022

Training data for YOLO

Related topics