Dear:
I have a question on the output for boxes, I know the shape of boxes is (10,) because we have 10 anchor boxes, and the boxes[2] has four values which corresponds to coordinates but I want to ask why the value here is so big, and even negative, what does these values exactly shows the coordinate of the box, could u please show a picture of the meaning of these values? thanks a lot
boxes[2] = [-1240.3483 -3212.5881 -645.78 2024.3052]
and also when we do the code below:
image, image_data = preprocess_image(“images/”+“test.jpg” , model_image_size = (608, 608))
yolo_model_outputs = yolo_model(image_data)
print(yolo_model_outputs[0,0,2,0:5])
I got the following result:
<tf.Tensor: shape=(5,), dtype=float32, numpy=
array([ 0.02702259, -1.5739233 , 0.6303474 , -1.9991553 ,
-11.171314 ], dtype=float32)> which represents the value of x y h and w
so my question is why do we have a negative y here where professor Andrew said the value should between 0 and 1.
Hey @jiacheng_Cui,
These values indeed represent the coordinates of the corners of the boxes selected by the yolo_eval
function. However, please don’t try to relate them to any sort of physical significance, such as why these are large or negative.
This is because, these values only come for the set of test values, which in themselves are flawed. If you take a look at the test cell after the yolo_eval
function, you will see something like:
yolo_outputs = (tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
tf.random.normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
tf.random.normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
tf.random.normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
Here, the first 2 tensors represent the X-Y coordinates and the Width/Height of the output boxes from the encoding model. Now, these are supposed to be positive, but they have been modelled using Normal distributions, i.e., these tensors consist of negative values as well.
In simple words, these values are only there because they want to test your code, and since the inputs are flawed, hence, these output values are flawed as well.
As for your second question, it is mentioned clearly in the assignment that:
The output of
yolo_model
is a (m, 19, 19, 5, 85) tensor that needs to pass through non-trivial processing and conversion. You will need to callyolo_head
to format the encoding of the model you got fromyolo_model
into something decipherable.
So, trying to understand the yolo_model_outputs
is not a good use of your time. You need to first pass it through the yolo_head
function, and then only, try to understand it. And if you want to understand the yolo_model_outputs
at any cost, then you can try to look at the source code of the YOLO’s implementation that has been used in the assignment. I hope this helps.
Regards,
Elemento