DLS - Course 4 - W3 - bounding box coordinates

ai_curious · April 19, 2023, 6:22pm

Not quite. Grids in YOLO are fixed size, determined before training starts, and their shape is not part of the network output. The predicted bounding box shape, b_w, b_h, can be smaller than, equal to, or larger than the grid cell in which it is located.

@Mithun_Kar
If you read the original papers carefully, or some of the several YOLO threads discoverable through the one linked by @paulinpaloalto above, you’ll see that YOLO doesn’t directly predict any of b_x, b_y, b_w, or b_h. Rather, the direct floating point values it outputs are subjected to further transformation to generate the location and shape coordinates. The inverse transformation must be performed when establishing the training data. Other than that, they are produced exactly the same way any neural network produces any floating point output. By that I mean labels provide ground truth values Y, the network generates predicted outputs \hat{Y}, and the loss function minimizes Y - \hat{Y} during training.

The expressions relating the b_{…} coordinates with the direct network outputs are discussed here:

Hope this helps

Topic		Replies	Views
YOLO algorithm DLS COURSE 4 Convolutional Neural Networks	2	681	September 27, 2021
Week 3 video: non max suppression Convolutional Neural Networks	5	614	April 2, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	793	July 10, 2021
Week 3: finding the correct cell in YOLO Convolutional Neural Networks	3	669	January 6, 2023
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	629	December 2, 2021

DLS - Course 4 - W3 - bounding box coordinates

Related topics