Week 3, course 4, programming assignment 1, yolo filter

I am struggling with the first part of this coding assignment, where (I believe) I am supposed to get the class indices that have the highest scores, and use them to retrieve the scores. My tensor box_classes has shape (19,19,5)with range 0-79 and box_scores has shape (19,19,5,80) . I need to use box_classes select elements of box_scores. I tried box_scores[box_classes] and this give the error "Expected begin, end, and strides to be 1D equal size tensors, but got shapes [1,19,19,5], [1,19,19,5], and [1] instead. " I expected an array with the same shape as box_classes. How can I use box_classes to select elements of box_scores?

The solution is to use arg_max to select the max scores

1 Like

Hi. I’m failing to understand something. In Exercise 1 Yolo filter boxes. We are asked to find the index for the max score then get the score.

First step of the exercise is to multiply the box_confidence by the box_class_probs which I may have done correctly.

Next we want to find the highest score out of all the possible classes for each box. Suggested tools are

tf.math.argmax and tf.math.reduce_max. argmax returns the index with the largest value across axes of a tensor. So when I apply it to the tensor box_scores, I’m expecting to get the index for the class with the highest probability for each of the boxes. I would think it would have the same shape as box_scores which is box_scores (19, 19, 5, 80)

Instead I am getting this shape box_classes (19, 19, 5).I’m not seeing how an index with shape (19, 19, 5) can be used to tell me the index of the highest value in the last dimension of a tensor with shape (19, 19, 5, 80). What am I missing?

Am I wrong to have expected a shape of (19, 19, 5, 80) for box classes?

I would expect from there to select the indexes from box_classes to box_scores to get a tensor of shape (19, 19, 5, 80)with the scores zeroed out for the indexes that were not the maximum. It seems as if you would need all 80 rows because the position of the max value in the last dimension tells you which class the object was classified as. Is this not right?

Which leads me to another question. Where does tf.math.reduce_max come in? It computes the maximum of elements across dimensions of a tensor. If I already have the indexes where the max occurs why do I need to compute the max element? Why not just use the index?
Thanks for your help and have a great day! George

Solved it. No need to answer.