How do you setup "yolo_anchors.txt"?

Hi friend, mentor,

I already passed the hw of “Autonomous_driving_application_Car_detection”. My questions are about the “yolo_anchors.txt” in this hw.

Q1. I saw there are 10 numbers in the txt, and I assume they represent the initial height and width of each 5 different boxes. During the learning, those 10 numbers will be changed. Is this correct?

Q2. If my understanding of Q1 is correct, how do I know what value is good value to start with it ? for example, if the picture resolution is 100x100, if I know the target is a car, and it’s about 20x80 size in the picture, then this initial value goes to 20 80 (let’s assume only one box, one car in the picture) ?

thank you!

That is not correct. Anchor box shapes are determined through exploratory data analysis on the training data. They are not learned by the network, and are not dynamic. If you change anchor box number, it changes your network architecture and you have to retrain, or at least do additional training with transfer learning. If you change the anchor box sizes, you still have to retrain, because it will alter where labelled objects within the training images are assigned to locations in the y matrix of input values.

The YOLO authors used k-means clustering to select the number of and shapes of anchor boxes. There are many places where the approach is discussed out on the interweb. There is also one here:

oh, I think I was confused between the bounding box and the anchor box. So, the anchor box is not learned but predefined. The bounding box is learned during the training. Right?

thank you! I will check this later for sure.

1 Like

You, me, and almost everyone else first learning this idea :grinning:

I believe are three important, related, concepts.

anchor boxes - chosen by humans through analysis on the training data. Once the number of anchors and their shapes are selected, they are constant throughout training one model. You could consider them hyperparameters, by which I mean you could try different settings then compare model performance in cross validation. The number of anchor boxes influences network output shape and thus the number of parameters to learn during training (and memory and computation costs). The shapes of the anchor boxes influence the shapes of the predicted bounding boxes. Well chosen anchors improve model stability during training and reduce training time and cost.

ground truth bounding boxes - provided as part of the labelled training data. They are static and independent of the number of grid cells, anchor boxes, network architecture etc…they are an intrinsic part of a training image and part of the y input data.

predicted bounding boxes - location and shape are part of the output of the forward propagation of the YOLO network, the \hat{y}. The computed error y - \hat{y} in the loss function is what drives parameter learning during training. Note that YOLO has a rather complex loss function that incorporates errors of location, shape, object presence, and class.

Hope this helps

1 Like

thank you!!!