Week 3: Car Detection with Yolo

Hello,

I just had some trouble understanding a few things about the YOLO algorithm. Could you please help?

3.5 - Run the YOLO on an Image

yolo_model_outputs = yolo_model(image_data)
yolo_outputs = yolo_head(yolo_model_outputs, anchors, len(class_names))

Here, is yolo_head()'s only purpose to convert yolo_model_outputs to boxes (midpoint), box confidence, and box class probs? Is this standard practice? To go back and forth and having to convert from one format to another? or was this for illustrative purposes or this necessary for YOLO?

2.5 - Wrapping Up the Filtering

Lastly, I’m not quit sure what scaling means in…

boxes = scale_boxes(boxes, image_shape)

When we scale, are the boxes now represented in terms of the whole image (i.e. the upper left corner of the 720 by 1280 image represents 0,0 and lower right represents 1,1) or is the 19*19 grid fitted onto the original image and therefore the dimensions of the boxes are still represented relative to the cell the object is detected in?

Sorry, if my questions aren’t very clear.

Thank you!

the code contained in yolo_head() is used twice. It is called from inside the yolo_loss() function each time that is invoked during training. And it is called after a ‘manual’ forward propagation such as making predictions on a single image. Factoring the code into a function allows it to be written once, used many. So from that point of view, I would say ‘Yes, it is standard practice’

1 Like

Boxes are converted to image relative coordinates at the end so they can be passed into a library drawing function (which knows nothing about grid cell dimensions)

1 Like

Hello,

Thank you so much for your answer! YOLO has been particularly confusing for me.

Thanks for the feedback. Nice to know it was helpful. I have some deep dive threads on YOLO that I posted during the summer . One is here, which contains links to a couple of the others. The structure of the network output, its relation to ground truth, and the idea of multiple concurrent predictions took me a long time to grok. Those threads are my attempt to explain what I learned over time.

1 Like