How exactly is a (19,19,5,85) tensor mathimatically equivalent to (19,19,425) and that is equivalent to 3 variables with dimensions box_confidence:(19,19,5,1) boxes:(19,19,5,4), box_class_probs (19,19,5,80)?? if it is broadcasting it should only get broadcasted to box_class_probs shape.
1 is for the predicted object presence probability p_c
4 is for the predicted bounding box center location (b_x, b_y) and shape (b_w,b_h)
80 is for the number of classes in the MS COCO dataset COCO - Common Objects in Context
19*19 is the number of grid cells
5 is the number of anchor boxes
85 = (1 + 4 + 80)
425 = 5 * 85
The total number of values in the training input and in the network output is (19*19*5*(1+4+80))=153,425 which you are free to stack or flatten into any shape that is convenient.
Take a look at this thread Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes
The slicing is basically stripping off some or all of one of the layers of the 4D object, depending on what you need to do with the data.( NOTE: The diagrams on that post use 3 for the number of grid cells, because that is simpler to draw, but the idea is the same. ) If you just want to work with presence probability, the bounding box shape, or the class prediction, you can use Python slicing to extract those elements from the larger object. If you pull out just the box shapes, you will have a (19,19,5,4) object. If you pull out just the class vector, it will be (19,19,5,80). If you want to manipulate everything behind the grid cells in one flattened vector, it will be (19,19,425) etc