For this exercise, a box is defined using its two corners: upper left (x_1, y_1) and lower right (x_2,y_2), instead of using the midpoint, height and width. This makes it a bit easier to calculate the intersection.

Looks to be x1, y1 = bottom left and x2, y2 = top right

for others who may end up reading this thread and want some help understanding why the text in the notebook is correct, here’s the story…

First, a reminder of what we’re trying to accomplish here: calculate a similarity index for two regions of an image. We’ll define this to be the ratio of the intersection area of the two regions to the union area of the two regions. In order to compute union, you need first compute the two separate region areas, which is straightforward, and the intersection area, which is the subject of this question.

The coordinate system chosen for this part of the exercise is corners-based, not center location- and shape-based, though you can do it either way. For comparison, the Darknet library code has IOU implemented using centers and shapes if you’re interested to see how they do it. The variables chosen to represent the corners of the intersection area are (xi1, yi1) and (xi2,yi2). The coordinate system is further defined as having the origin in the upper left. In 10^3 other words…

xi1 represents the x value of the left-hand-side of the intersection area. xi2 represents the x value for the right-hand-side. yi1 represents the y value for the top of the intersection area. yi2 represents the bottom.

Here it is again with all of the coordinates for the two regions B1 and B2 included