In exercise 1, I am having trouble creating the mask in this problem. I believe I have the correct dimensions for box_class_scores and box_classes. I printed the shapes. Here is the error I am receiving.

Not quite sure what I am doing wrong:

In exercise 1, I am having trouble creating the mask in this problem. I believe I have the correct dimensions for box_class_scores and box_classes. I printed the shapes. Here is the error I am receiving.

Not quite sure what I am doing wrong:

That probably means youâ€™ve mixed up the computation of â€śscoresâ€ť versus â€śclassesâ€ť. Note that scores should be floating point values and classes would be integers. The error tells you that your `box_class_scores`

are actually integers. So have a more careful look at that â€¦

I got it to work. I ended up using argmax for box_classes and np.max for box_class_scores. Also, I was using box_classes to calculate my box_class_scores. It took me a second to fully understand that we are supposed to be getting box_class_scores from box_scores. Can you provide further detail on why this is?

I added some print statements to my `yolo_filter_boxes`

function to show the shapes and data types of the inputs and the generated values. Hereâ€™s what I get:

```
boxes.shape (19, 19, 5, 4)
boxes.dtype <dtype: 'float32'>
box_scores.shape (19, 19, 5, 80)
box_scores.dtype <dtype: 'float32'>
box_classes.shape (19, 19, 5)
box_classes.dtype <dtype: 'int64'>
box_class_scores.shape (19, 19, 5)
box_class_scores.dtype <dtype: 'float32'>
filtering_mask.shape (19, 19, 5)
filtering_mask.dtype <dtype: 'bool'>
sum(filtering_mask) = 1789
```

To understand why you use `argmax`

to get the classes and `reduce_max`

to get the scores requires understanding the meaning of the data.

The input tensor `box_class_probs`

has shape 19 x 19 x 5 x 80. For each combination of the first three arguments (h, w, anchor_box) you get an 80 element vector that is essentially a softmax output for 80 types of object. So the entry out of those 80 that has the highest value tells us which class of object is the most likely one contained in the corresponding bounding box (which is different than anchor box). So what argmax gives you is the index of the entry with the highest value, right? So thatâ€™s a value between 0 and 79 (an integer) that identifies the class. Then the actual score corresponding to that is the maximum floating point value, but first multiplied by the corresponding â€śconfidenceâ€ť value. To get that you use `reduce_max`

.

All this information about the structure and meaning of the data was explained in the notebook. If my explanation above is not enough, I suggest you read over the notebook from the beginning again with what I said above in mind.

1 Like

Thank you so much. I understand it a lot better now