About course4 week3 on assignment1

Dear:
I have a question on the output for boxes, I know the shape of boxes is (10,) because we have 10 anchor boxes, and the boxes[2] has four values which corresponds to coordinates but I want to ask why the value here is so big, and even negative, what does these values exactly shows the coordinate of the box, could u please show a picture of the meaning of these values? thanks a lot
boxes[2] = [-1240.3483 -3212.5881 -645.78 2024.3052]

and also when we do the code below:
image, image_data = preprocess_image(“images/”+“test.jpg” , model_image_size = (608, 608))
yolo_model_outputs = yolo_model(image_data)
print(yolo_model_outputs[0,0,2,0:5])
I got the following result:
<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([ 0.02702259, -1.5739233 , 0.6303474 , -1.9991553 ,
-11.171314 ], dtype=float32)> which represents the value of x y h and w
so my question is why do we have a negative y here where professor Andrew said the value should between 0 and 1.

Hey @jiacheng_Cui,
These values indeed represent the coordinates of the corners of the boxes selected by the yolo_eval function. However, please don’t try to relate them to any sort of physical significance, such as why these are large or negative.

This is because, these values only come for the set of test values, which in themselves are flawed. If you take a look at the test cell after the yolo_eval function, you will see something like:

yolo_outputs = (tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))

Here, the first 2 tensors represent the X-Y coordinates and the Width/Height of the output boxes from the encoding model. Now, these are supposed to be positive, but they have been modelled using Normal distributions, i.e., these tensors consist of negative values as well.

In simple words, these values are only there because they want to test your code, and since the inputs are flawed, hence, these output values are flawed as well.

As for your second question, it is mentioned clearly in the assignment that:

The output of yolo_model is a (m, 19, 19, 5, 85) tensor that needs to pass through non-trivial processing and conversion. You will need to call yolo_head to format the encoding of the model you got from yolo_model into something decipherable.

So, trying to understand the yolo_model_outputs is not a good use of your time. You need to first pass it through the yolo_head function, and then only, try to understand it. And if you want to understand the yolo_model_outputs at any cost, then you can try to look at the source code of the YOLO’s implementation that has been used in the assignment. I hope this helps.

Regards,
Elemento