Week 3 Autonomous_driving_application_Car_detection confusion about YOLO algorithm

When completing the filtering and max suppression for YOLO algorithm, I realized the grid information are lost during masking (and during the swapping order after max suppressions). The box positions are all normalized relative to the grid and are [0,1]. Then how does the max suppression and the final visualization of boxes work without knowing which grid the box is from?

Hi @Jacky43805 ,

In YOLO, even though the grid information is not kept after masking and max suppression, the bounding box coordinates are first predicted relative to their grid cells. In processing, these relative coordinates are changed into absolute positions using the grid cell’s location. This makes sure the boxes stay in the right place in the image. Non-Max Suppression (NMS) works on these absolute coordinates, choosing the best boxes based on confidence scores and removing extra ones. So, even if the grid details are lost, the boxes are still placed correctly because of this step.

Hope it helps! Feel free to ask if you need further assistance.

Thank you for taking the time to reply. I really appreciate your insights. I was under the impression that, the NMS step compares the highest probability box with all overlapping boxes, which would mean it needs to compare boxes from different grids as well (if they overlap). I’d like to know if there’s something I might be missing here.

1 Like

Thank you for your reply. I looked back into the assignment and found out this step is implemented in yolo_head function imported from document keras_yolo.py. I had the confusion because all demos are done with the normalized data but what is passed to NMS are already absolute values. Thanks again for your input!

1 Like

You’re welcome! happy to help. Feel free to reach out if you need further assistance.

Hi, I just noticed that inside the yolo_head function they applied non-linearity to the box location and dimensions.
box_xy = K.sigmoid(feats[…, :2])
box_wh = K.exp(feats[…, 2:4])

where feats are reshaped yolo outputs
I wonder what is the intention here, and I am also confused that how this non-linearity leads to an accurate presentation of the locations (which should be linear).

Lots of threads created about YOLO over the years. Here’s an example that goes into some depth about those specific expressions…

2 Likes