Autonomous_driving_application_Car_detection Exercise 1

In exercise 1, I am having trouble creating the mask in this problem. I believe I have the correct dimensions for box_class_scores and box_classes. I printed the shapes. Here is the error I am receiving.

image

Not quite sure what I am doing wrong:

That probably means you’ve mixed up the computation of “scores” versus “classes”. Note that scores should be floating point values and classes would be integers. The error tells you that your box_class_scores are actually integers. So have a more careful look at that …

I got it to work. I ended up using argmax for box_classes and np.max for box_class_scores. Also, I was using box_classes to calculate my box_class_scores. It took me a second to fully understand that we are supposed to be getting box_class_scores from box_scores. Can you provide further detail on why this is?

I added some print statements to my yolo_filter_boxes function to show the shapes and data types of the inputs and the generated values. Here’s what I get:

boxes.shape (19, 19, 5, 4)
boxes.dtype <dtype: 'float32'>
box_scores.shape (19, 19, 5, 80)
box_scores.dtype <dtype: 'float32'>
box_classes.shape (19, 19, 5)
box_classes.dtype <dtype: 'int64'>
box_class_scores.shape (19, 19, 5)
box_class_scores.dtype <dtype: 'float32'>
filtering_mask.shape (19, 19, 5)
filtering_mask.dtype <dtype: 'bool'>
sum(filtering_mask) = 1789

To understand why you use argmax to get the classes and reduce_max to get the scores requires understanding the meaning of the data.

The input tensor box_class_probs has shape 19 x 19 x 5 x 80. For each combination of the first three arguments (h, w, anchor_box) you get an 80 element vector that is essentially a softmax output for 80 types of object. So the entry out of those 80 that has the highest value tells us which class of object is the most likely one contained in the corresponding bounding box (which is different than anchor box). So what argmax gives you is the index of the entry with the highest value, right? So that’s a value between 0 and 79 (an integer) that identifies the class. Then the actual score corresponding to that is the maximum floating point value, but first multiplied by the corresponding “confidence” value. To get that you use reduce_max.

All this information about the structure and meaning of the data was explained in the notebook. If my explanation above is not enough, I suggest you read over the notebook from the beginning again with what I said above in mind.

1 Like

Thank you so much. I understand it a lot better now