In the absence of further examples, maybe let’s explore these one at a time. This might help us identify the source of some of the contradictory and incorrect information. In your understanding is position a pixel in an image? A 480x480 image contains 230,400 pixels; are there also 230,400 anchor boxes? If not, how many are there? Is it the same for every image? How is an anchor box position determined, assigned, and used in the code? If I recall correctly, the anchors.txt in the exercise contains 10 numbers read into a Python list. The code treats these as 5 pairs of 2; is one of those the position? The equations for predicted bounding box shape in the original YOLO papers (and in the code for this exercise) only describe and use two values for an anchor box. Is either one of those a position?
The equations show up in the research papers but also in several existing threads. Here is one:
Note: the paper refers to them as dimension priors with p_w and p_h for width and height, respectively. (Spoiler alert, no anchor box position)