Understand yolo_boxes_to_corners

I would like to understand this function’s code. I get the concept behind converting box coordinate systems from grid-cell to image wide, but I do not get this code:

def yolo_boxes_to_corners(box_xy, box_wh):
    """Convert YOLO box predictions to bounding box corners."""
    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)

    return tf.keras.backend.concatenate([
        box_mins[..., 1:2],  # y_min
        box_mins[..., 0:1],  # x_min
        box_maxes[..., 1:2],  # y_max
        box_maxes[..., 0:1]  # x_max

Why dividing by 2 ? What is the purpose of this ellipsis ... ? Why these indices get y_min x_min and so on ?

xy gives the center coordinate location. From there, the left hand, right hand, top, and bottom sides of the bounding box are each 1/2 of the width or height away. The minimum corner is to the left and above the center, hence the subtraction. The maximum corner is to the right and below. (Remember where the origin is in this coordinate system). In a 2D example, if the center is at (4,4) and the width is 2, the corners are (3,3), (3,5), (5,3), and (5,5). Note, however, that the shape of box_xy and box_wh isn’t actually that simple.

is Python for take all the dimensions over here. In YOLO, there are going to be bounding box predictions for the entire (S*S*B) output shape. So this code is breaking down a multi-dimensional object to extract four values - first it gets the two for the upper left corner coordinates and the two values for the lower right corner coordinates, then it further breaks those pairs down into single values.

[…,0:1] is the first value of an x,y pair for a corner, the x
[…,1:2] is the second value of an x,y pair for a corner, the y

Because of the you collect all the x coordinates for the bounding box upper lefts, the y coordinates for the upper lefts, the x coordinates for the lower rights, the y coordinates for the lower rights, and stack them into a new variable.

What came in as a multidimensional object containing two pairs - center and shape - goes back out as a multidimensional object containing four separate values representing the x and y of the two bounding box corners.

Hope this helps.

thanks for the explanation. I’m just wondering what is S and B when you define (S*S*B) output shape.

S is grid cell dimension, B is anchor boxes. This nomenclature is standard in all of the YOLO papers.