YOLO training set

What training set do I need to train the YOLO framework?
I would guess that my training set would need to have a long vector (for each anchor, a bounding box and class information) filled out manually for each of the training images. In addition to being cumbersome, this seems to be wasteful as well, because I would need each of the possible classes to occur in many possible locations, which would seem to require an exponential number of training images.

Hi Meir,

Here are my two cents:

Cumbersome yes, exponential due to possible locations no. The algorithm is calibrated to detect the shapes of the relevant objects belonging to the classes. The locations do not influence these shapes.

Do you mean to say that max pooling layers reduce the size of the image thereby making the position of the objects irrelevant?

Hi Meir,

No, I am thinking about features of objects belonging to classes. These are independent of locations. But maybe I misunderstand something?