I am confused about the reduce_max operation required in the assignment, especially concerning the axis. Intuitively, this operation should be taking every box (5) of every cell (19x19), and finding the max probability class, thus collapsing the 4th dimension, and leaving us with a 5x19x19. However, when I experimented with the reduce_max function on a 3D example, using axis=-1 (or axis=2), it didn’t give me what I wanted. Instead, it was axis=0, which I thought was supposed to be column-wise, that gave me the results I sought. Please help give me clarification on the mechanics of this. Thank you!

Can anyone help me? I want to understand this.

First of all, it is better for you to clearly describe the number of assignments that you want to discuss if you want to have a quick assistance., since several assignments in Course 4.

So, assuming that you are talking about W3A1, the first assignment in the week 3, I think your dimension analysis seems to be incorrect. First of all, I think it is better for you to understand the overall flow in Yolo.

I assume that you are talking about `reduce_max()` in `yolo_filter_boxes().` If this is not the case, please clarify your point.

In `yolo_filter_boxes()`, what you get from a caller are `box_confidence`,`box_class_probs.` and bounding box information created from `box_xy` and `box_wh`.

What we wan to pick up is the most probable class with using “confidence” * “class probabilities”.
The dimension of box_scores is (19,19,5,80). So, we want to pick up the max value from the last dimension.

Hope this helps.

For argmax, I am also confused:

A=[[[3,2],[3,4]],[[1,6],[7,8]],[[51,5],[4,9]]]
print(tf.argmax(A,axis=2))

gave me

tf.Tensor(
[[0 1]
[1 1]
[0 1]], shape=(3, 2), dtype=int64),

but shouldn’t it be [[1,1],[1,1],[0,0]]?

What you defined is 3x2x2 array like this.

And, what you create is a list… So, convert it to np.array to check the shape.

``````A=[[[3,2],[3,4]],[[1,6],[7,8]],[[51,5],[4,9]]]
print(np.argmax(A, axis=2))
b = np.array((A))
print("b.shape ="+str(b.shape))
print(np.argmax(b, axis=2))
``````
``````[[0 1]
[1 1]
[0 1]]
b.shape =(3, 2, 2)
[[0 1]
[1 1]
[0 1]]
``````

As you see, an orange box is selected as a max value for axis=2, and associated index is returned.

If you get a slice with this, you see the above assignment is correct. (“channel” is “depth”, i.e, axis=2)

``````print("channel 0 ="+str(b[:,:,0]))
print("channel 1 ="+str(b[:,:,1]))
``````
``````channel 0 =[[ 3  3]
[ 1  7]
[51  4]]
channel 1 =[[2 4]
[6 8]
[5 9]]
``````

Hope this helps.

Wouldn’t it be like that but flipped, so that [3,2], is the first row of the first “layer” (axis 3)

I thought the 3rd axis would be like z (depth)

Yes, 3rd axis (axis=2) is depth (channel).

So what exactly are the indices that are returned by the function saying?

I suppose we can back to your original question.

The input to` tf.math.reduce_max` is “`box_scores`”. Its shape is 19x19x5x80. As you see from my figure above, it is a probability distribution (80 classes) for 5 anchor boxes. We need to find the max prob for each anchor. So, we set axis=-1.
Then, the result is a max value for each anchor block. And, the last dimension is reduced. As the result, we get the most possible class (like car, cat) detected by an anchor. And, we have 5 anchors. So, the result is 19x19x5. This is quite expected result, since we have 19x19 grids with 5 anchors for each. And, each anchor has the most possible class of detected object.
After some threshold value check, now we can start non-max-suppression to finalize the bounding box for detected objects.

Hope this helps.

I understand. However, the argmax indices I got don’t make sense.

`the argmax indices I got don’t make sense.`

For that point, I guess we can not help. Please play numpy array and get some slices. Then, you will see how numpy assign values to each axis. Even in your example, if you get a slice for depth=0, then, as I showed, you will get the following value. All are based on your definition for A.

``````array([[ 3,  3],
[ 1,  7],
[51,  4]])
``````

This is still a simple case. We usually handle more high dimensional Tensors like 4D, 5D, … So, the important thing is to understand the shape, and find the target axis to tackle.

what is the difference between an array and a list

I am understanding everything a lot better now thank you so much for you detailed help!