Dimension for anchor boxes

jacknguyen101 · June 25, 2021, 3:33am

In the programming exercises, we have:

The dimension for anchor boxes is the second to last dimension in the encoding: (𝑚,𝑛𝐻,𝑛𝑊,𝑎𝑛𝑐ℎ𝑜𝑟𝑠,𝑐𝑙𝑎𝑠𝑠𝑒𝑠).
Actually, I don’t get it quite well, I thought that the anchor boxes were defined by 𝑛𝐻 and 𝑛𝑊 ( its height and width). And the dimensions of each ancho box should be (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤,𝑐lasses)
I hope you could help me clarify this!
Thanks a lot!

reinoudbosch · June 27, 2021, 4:47pm

Hi jacknguyen101,

The dimension of the anchor boxes mentioned here refers to the dimension of the training set. There are m images in the training set, with height n_H, width n_W, and they belong to a particular anchor (anchors), and particular class (classes).

The dimension you are referring to with (pc, bx, by, bh, bw, classes) is the dimension of the output of the model.

ai_curious · December 21, 2021, 4:01pm

There are 3 values one might consider dimensions related to anchor boxes. First, is the number of anchor boxes being used. In the original YOLO v2 research paper, the number was 2. In the car detection programming exercise the number is 5. The other two dimensions are the height and width of the anchor boxes themselves. There is a utility file in the exercise called yolo_anchors.txt that contains 10 values…height and width for each of the 5 anchor boxes. This is completely independent from the shape of the input image as well as the number of training images being used. In the original post above, anchors is 5.

reinoudbosch · December 26, 2021, 8:49pm

Hi ai_curious,

The statement in the assignment refers to the total collection of scores based on the anchor boxes:

“The dimension for anchor boxes is the second to last dimension in the encoding: (𝑚, 𝑛_𝐻, 𝑛_𝑊, 𝑎𝑛𝑐ℎ𝑜𝑟𝑠, 𝑐𝑙𝑎𝑠𝑠𝑒𝑠)
The YOLO architecture is: IMAGE (m, 608, 608, 3) → DEEP CNN → ENCODING (m, 19, 19, 5, 85).”

For clarity, it might have been better if the text had read something like ‘The dimension of the encoding tensor based on the anchor boxes is (𝑚, 𝑛_𝐻, 𝑛_𝑊, 𝑎𝑛𝑐ℎ𝑜𝑟𝑠, 𝑐𝑙𝑎𝑠𝑠𝑒𝑠)’.

ai_curious · December 27, 2021, 12:55pm

If the lower case encoding (𝑚, 𝑛_𝐻, 𝑛_𝑊, 𝑎𝑛𝑐ℎ𝑜𝑟𝑠, 𝑐𝑙𝑎𝑠𝑠𝑒𝑠) and the upper case ENCODING (m, 19, 19, 5, 85) are supposed to be referring to the same thing, then like the original poster I find them incongruent. The word classes in this exercise means 80, does it not? It should be (1 + 4 + classes).

Also, it’s a little imprecise to state that (pc, bx, by, bh, bw, classes) is the dimension of the output of the model.. Shouldn’t that be S*S*B*(1+4+classes)

reinoudbosch · December 27, 2021, 5:38pm

Yes, it all is imprecise. It seems to me that the author of the assignment was trying to say a number of things at the same time, squeezing everything into an imprecise statement. Fortunately, this has not led to much confusion so far, as there has only been one question about this since the refresh. But I’ll report it at the backend.

Topic		Replies	Views
Week 3 - Car Detection Anchor Boxes Convolutional Neural Networks	14	946	July 11, 2023
Programming Exercise - Anchor Boxes Convolutional Neural Networks	3	684	June 19, 2022
Week 3 Question About Anchor Box Dimensions Convolutional Neural Networks	2	503	September 15, 2022
Week 3 A1 Part 1 Tensor dimensions clarification Convolutional Neural Networks	3	504	February 21, 2023
How is an anchor defined in Week 3 Yolo assignment Convolutional Neural Networks week-3	1	23	March 24, 2025

Dimension for anchor boxes

Related topics