About course4 week3 on assignment1

jiacheng_Cui · June 11, 2022, 2:11pm

Dear:
I have a question on the output for boxes, I know the shape of boxes is (10,) because we have 10 anchor boxes, and the boxes[2] has four values which corresponds to coordinates but I want to ask why the value here is so big, and even negative, what does these values exactly shows the coordinate of the box, could u please show a picture of the meaning of these values? thanks a lot
boxes[2] = [-1240.3483 -3212.5881 -645.78 2024.3052]

jiacheng_Cui · June 11, 2022, 3:57pm

and also when we do the code below:
image, image_data = preprocess_image(“images/”+“test.jpg” , model_image_size = (608, 608))
yolo_model_outputs = yolo_model(image_data)
print(yolo_model_outputs[0,0,2,0:5])
I got the following result:
<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([ 0.02702259, -1.5739233 , 0.6303474 , -1.9991553 ,
-11.171314 ], dtype=float32)> which represents the value of x y h and w
so my question is why do we have a negative y here where professor Andrew said the value should between 0 and 1.

Elemento · June 12, 2022, 6:00am

Hey @jiacheng_Cui,
These values indeed represent the coordinates of the corners of the boxes selected by the yolo_eval function. However, please don’t try to relate them to any sort of physical significance, such as why these are large or negative.

This is because, these values only come for the set of test values, which in themselves are flawed. If you take a look at the test cell after the yolo_eval function, you will see something like:

yolo_outputs = (tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
                tf.random.normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))

Here, the first 2 tensors represent the X-Y coordinates and the Width/Height of the output boxes from the encoding model. Now, these are supposed to be positive, but they have been modelled using Normal distributions, i.e., these tensors consist of negative values as well.

In simple words, these values are only there because they want to test your code, and since the inputs are flawed, hence, these output values are flawed as well.

As for your second question, it is mentioned clearly in the assignment that:

The output of yolo_model is a (m, 19, 19, 5, 85) tensor that needs to pass through non-trivial processing and conversion. You will need to call yolo_head to format the encoding of the model you got from yolo_model into something decipherable.

So, trying to understand the yolo_model_outputs is not a good use of your time. You need to first pass it through the yolo_head function, and then only, try to understand it. And if you want to understand the yolo_model_outputs at any cost, then you can try to look at the source code of the YOLO’s implementation that has been used in the assignment. I hope this helps.

Regards,
Elemento

Topic		Replies	Views
Confusion about the output of scale_boxes in Week3 Convolutional Neural Networks coursera-platform	2	582	January 13, 2022
How to interpret values of box_xy, box_wh in yolo_eval Convolutional Neural Networks coursera-platform	14	748	August 18, 2021
Week 3 - Car detection - yolo_eval Convolutional Neural Networks coursera-platform	9	933	July 13, 2021
W3 - Yolo - Yolo_filter_boxes assignment issue Convolutional Neural Networks week-module-3 , coursera-platform	1	279	January 7, 2024
Course 4 week 3 assignment 1: yolo - error in yolo_filter_boxes Convolutional Neural Networks coursera-platform	4	561	September 16, 2022

About course4 week3 on assignment1

Related topics