Say you have human labeled training data where images are split into 19x19 grid cells. Now, if you want to use this dataset for your YOLO algorithm but want 25x25 grid cells, how would you do it? It is impractical to label millions of images manually yourself using 25x25 grid. How can you use the available 19x19 grid training data for 25x25 grid based YOLO algorithm?
If memory serves, the only thing that the grid is used for in terms of YOLO data is deciding which grid cell contains the centroid of a given object. There is no requirement that any object be completely contained in a given grid cell. So I would think you could just write an algorithm to do the conversion here: you have the input with the centroids and associated grid cells and you know the cell dimensions. Now compute the new cells and figure out where each centroid belongs in the new grid. You also have to rescale the sizes of the bounding boxes as I think they are expressed in multiples of the grid cell size. It’s all just analytic geometry, right? No ML involved. Rene Descartes could have given you the solution in 1645.
Thanks @paulinpaloalto.
I was thinking that but it felt too simple to be true
The raw data used for training YOLO knows nothing about grids; it’s just object location and type information. As @paulinpaloalto suggests above, the raw image input data is mapped algebraically into a ground truth matrix with the same dimensions of the YOLO network output layer based on the number of grid cells and the number of pixels per grid cell (and # anchor boxes and classes). This is what enables the loss function to compute error, the difference between the ground truth, Y, and the predicted output, \hat{Y}. If you change the number of grid cells, you alter the shape of both Y and \hat{Y} and must redo the cell assignment of the objects.
By the way, if one is going to manually compute and assign training data locations and values as suggested above, it is important to remember that YOLO does not predict the bounding box location or shape values b_i directly. Rather it predicts a set of four other numbers t_i that are related to the values of b_i by equations provided in the YOLO papers and discussed a bit here: Applying YOLO anchor boxes
Cheers