Understand yolo_boxes_to_corners

hitoro · March 25, 2022, 10:48am

Hi,
I would like to understand this function’s code. I get the concept behind converting box coordinate systems from grid-cell to image wide, but I do not get this code:

def yolo_boxes_to_corners(box_xy, box_wh):
    """Convert YOLO box predictions to bounding box corners."""
    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)

    return tf.keras.backend.concatenate([
        box_mins[..., 1:2],  # y_min
        box_mins[..., 0:1],  # x_min
        box_maxes[..., 1:2],  # y_max
        box_maxes[..., 0:1]  # x_max
    ])

Why dividing by 2 ? What is the purpose of this ellipsis ... ? Why these indices get y_min x_min and so on ?

ai_curious · March 25, 2022, 12:09pm

xy gives the center coordinate location. From there, the left hand, right hand, top, and bottom sides of the bounding box are each 1/2 of the width or height away. The minimum corner is to the left and above the center, hence the subtraction. The maximum corner is to the right and below. (Remember where the origin is in this coordinate system). In a 2D example, if the center is at (4,4) and the width is 2, the corners are (3,3), (3,5), (5,3), and (5,5). Note, however, that the shape of box_xy and box_wh isn’t actually that simple.

… is Python for take all the dimensions over here. In YOLO, there are going to be bounding box predictions for the entire (S*S*B) output shape. So this code is breaking down a multi-dimensional object to extract four values - first it gets the two for the upper left corner coordinates and the two values for the lower right corner coordinates, then it further breaks those pairs down into single values.

[…,0:1] is the first value of an x,y pair for a corner, the x
[…,1:2] is the second value of an x,y pair for a corner, the y

Because of the … you collect all the x coordinates for the bounding box upper lefts, the y coordinates for the upper lefts, the x coordinates for the lower rights, the y coordinates for the lower rights, and stack them into a new variable.

What came in as a multidimensional object containing two pairs - center and shape - goes back out as a multidimensional object containing four separate values representing the x and y of the two bounding box corners.

Hope this helps.

hitoro · March 25, 2022, 12:27pm

thanks for the explanation. I’m just wondering what is S and B when you define (S*S*B) output shape.

ai_curious · March 25, 2022, 12:40pm

S is grid cell dimension, B is anchor boxes. This nomenclature is standard in all of the YOLO papers.

Topic		Replies	Views
How to interpret values of box_xy, box_wh in yolo_eval Convolutional Neural Networks coursera-platform	14	771	August 18, 2021
Yolo box to corners Convolutional Neural Networks coursera-platform	2	558	July 22, 2021
Applying YOLO anchor boxes Convolutional Neural Networks coursera-platform	6	1553	July 26, 2023
About course4 week3 on assignment1 Convolutional Neural Networks coursera-platform	2	529	June 12, 2022
Course-4 Week-3 Assignment-Ex-1:FILTER thresholding with a class score Convolutional Neural Networks coursera-platform	1	610	December 30, 2021

Understand yolo_boxes_to_corners

Related topics