Good day,

I need a refresher on Tensor Dimensions. I need to make sure I am understanding correctly.

So by my understanding

boxes – tensor of shape (19, 19, 5, 4) - so the first 2 dimensions are the dimensions of the image after encoding. The 3rd dimension represents the 5 anchor boxes for each of these 19^2 grid cells and the 4 represents bx, by,bh,bw for each of the 5 anchor boxes, in each of these 19^2 grid cells.

box_confidence – tensor of shape (19, 19, 5, 1) - This encodes for each anchor box in each grid cell the confidence that there is some object detected.

box_class_probs – tensor of shape (19, 19, 5, 80) - This encodes for all 80 classes, in each of the 5 anchor boxes for each of the 19^2 grid cells the probabilities that class is present in the anchor box. So for a particular grid cell, and for each of the 5 anchors boxes in that grid cell there is an 80 dimensional vector with all the class probabilities for the 80 classes.

So now we are told box_scores is of dimension (19,19,5,80). Here is my first down break in understanding. Why do we need to calculate box_scores. Surely box_class_probs already encodes all the necessary information?

Now I get the following shapes:

Box scores shapes (19, 19, 5, 80)

Box classes [19 19 5]

Box class scores[19 19 5]

Now I understand how we get the shape for Box_Scores but I don’t quiet understand what information is encoded by the next 2. But I think Box classes shape is that it represents for each of the 5 anchor boxes, in each of the 19^2 grid cells the class with highest probability and Box_class_scores represents the corresponding score associated with each of those classes.

Is my understanding mostly correct?

I also need a hint for how I make the filtering mask have the same dimensions as box_class_scores.