DLS - Course 4 - W3 - bounding box coordinates

Mithun_Kar · April 19, 2023, 2:57pm

Seeking for more clarification and my sincere apology if I might miss any crucial information in the lecture on Object Detection algorithms under DLS Course 4 - W3.

In case of finding the bounding box coordinates (b-x, b-y, b-h and b-w), how YOLO actually derives those coordinates? I do understand how Non-Max Suppression is then applied to pick the right one, but how YOLO derives the values at first place?

Thank you

gent.spah · April 19, 2023, 3:04pm

It first gets trained with images that have objects that are given these coordinates for.

Once training is satisfactory it can then estimate for similar images i.e. output these numbers itself from what it has learned.

saifkhanengr · April 19, 2023, 3:09pm

In the input data, we give all the values, bx and by as the center of the object and bh and bw as height and width. So, after training on it, the model then tries to find the height and width for a grid where the model considers the center of the object might exist.

Updated: In my above message, I mean in grid cells (where model think object center might be), model tries to predict the bh and bw for a bounding box.

Best,
Saif.

paulinpaloalto · April 19, 2023, 4:44pm

As Gent and Saif have said, the YOLO algorithm learns that through training based on labelled data that includes the bounding boxes. If you want to dig deeper into how the algorithm works, there are a number of detailed threads about YOLO on the forums, e.g. this one would be a good place to start and it links to some others.

Mithun_Kar · April 19, 2023, 6:07pm

ok, thanks for your response. Much appreciated.

Mithun_Kar · April 19, 2023, 6:09pm

Thank you Saif. Much appreciated

Mithun_Kar · April 19, 2023, 6:10pm

Thank you Paulin. I will go through the links.

ai_curious · April 19, 2023, 6:22pm

Not quite. Grids in YOLO are fixed size, determined before training starts, and their shape is not part of the network output. The predicted bounding box shape, b_w, b_h, can be smaller than, equal to, or larger than the grid cell in which it is located.

@Mithun_Kar
If you read the original papers carefully, or some of the several YOLO threads discoverable through the one linked by @paulinpaloalto above, you’ll see that YOLO doesn’t directly predict any of b_x, b_y, b_w, or b_h. Rather, the direct floating point values it outputs are subjected to further transformation to generate the location and shape coordinates. The inverse transformation must be performed when establishing the training data. Other than that, they are produced exactly the same way any neural network produces any floating point output. By that I mean labels provide ground truth values Y, the network generates predicted outputs \hat{Y}, and the loss function minimizes Y - \hat{Y} during training.

The expressions relating the b_{…} coordinates with the direct network outputs are discussed here:

Hope this helps

Mithun_Kar · April 19, 2023, 7:06pm

Thanks. I will go through. Very much appreciated

saifkhanengr · April 19, 2023, 11:38pm

Yes, you are right. In my above message, I mean in grid cells (where model think object center might be), model tries to predict the bh and bw for a bounding box. Pardon my vague words.

Best,
Saif.

Topic		Replies	Views
YOLO algorithm DLS COURSE 4 Convolutional Neural Networks	2	678	September 27, 2021
Week 3 video: non max suppression Convolutional Neural Networks	5	602	April 2, 2023
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks	6	786	July 10, 2021
Week 3: finding the correct cell in YOLO Convolutional Neural Networks	3	667	January 6, 2023
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	629	December 2, 2021

DLS - Course 4 - W3 - bounding box coordinates

Related topics