In the assignment there is a conversion between centroid and bounding boxes coordinates into boxes corners via the function yolo_boxes_to_corners(box_xy, box_wh) . I was wondering how one could detect in a image the (x,y) centroids. Let’s say for example I have the following data:

x y
148.782695 181.477543
96.603038 159.680978

How I could find the pixel of such coordinates in an image such as this one ?

On this other recent thread, ai_curious explained how the coordinate system works for images. The principles are the same as the normal way you do analytic geometry in a 2D plane but with a different orientation (flipped vertically about the x axis):

The upper left corner of the image has coordinates (0, 0) and increasing x values move to the right on the x axis and increasing y values move down on the y axis.

The same way you predict (not detect) anything in deep learning: provide lots of labelled examples and a loss function that can be minimized, then iteratively update network parameters until your predictions are acceptable.

I don’t understand the other part of the question. Coordinates are pixels. If not directly image-relative coordinates then, as in YOLO, floating point factors that are grid cell relative and can be converted back to image relative - it’s merely a unit of measure conversion.