YOLO assignment exercise 1 shape question

Hello,

For exercise 1 in the YOLO assignment, the tensor box_class_scores is of the shape (19,19,5) and the boolean mask filtering_mask is also of the shape (19,19,5). However, after doing scores = tf.boolean_mask(box_class_scores, filtering_mask), the output shape is (1789, ) and I’m not sure where this 1789 came from. I was expecting the shape to be (1805,) where 1805 = 19*19*5.
I must missed something here and thank you in advance for helping out !

2 Likes

Hi Peeta_Li,

tf.boolean_mask extracts the positions in box_class_scores where filtering_mask is True. So, in your case filtering_mask has 1789 Trues and 16 Falses.

Cheers

2 Likes

Thank you, that totally makes sense!

thank you for the clarification. This means that the output is position where the mask value is true. then it unrolled into (1789,). Is this flattening the consequence of leaving the axis to default of 0?

What was the logic for not somehow keeping the original tensor dimensions?

I don’t love the use of position there. The output does contain the elements of the input that satisfy the filter mask, but position information is not retained. Maybe think of it almost as whitespace removed?

The TF doc describes the output as a tensor populated by entries in [input tensor] corresponding to True values in [mask] . it is not sparse, meaning the False values are dropped entirely.

What was the logic for not somehow keeping the original tensor dimensions?

The problem is that you do not have the same number of elements as before. Imagine a simple 3x3 tensor, so 9 elements. If you remove one element, there are 8 elements remaining so you cannot get a 3x3 tensor anymore. You may be able to reshape it to 4x2, 2x4, 1x8, 8x1, but it is arguably better to leave that decision to the user.
If you are using tf.boolean_mask the assumption is that you want to extract the elements and therefore you do not care about the shape anymore. If you want to specify the locations of interest and keep the dimension of the original tensor, then a mask would give you that (actually filtering_mask has that information).

Hope it was clear.

2 Likes

Thank you for the explanation.

@isaac.casm @ai_curious
Why are there 1789 Trues and 16 False??

I totally missed this message with the Christmas holidays.
To be absolutely fair, I am not sure, I don’t know where that code comes from. My guess is that it is the class probabilities of the model output. If that is the case, then most likely the model was still being trained and that is the reason for having such a large number of Trues. But this number will change depending on the input image.

At this point in the class exercise the values are just random numbers, so the results depend entirely on the sample distribution specified in the call to the generator. It’s not realistic, so you can ignore the numeric values here; only the shapes are reasonable.