Week 3 Autonomous_driving_application_Car_detection confusion about YOLO algorithm

Jacky43805 · January 30, 2025, 4:59pm

When completing the filtering and max suppression for YOLO algorithm, I realized the grid information are lost during masking (and during the swapping order after max suppressions). The box positions are all normalized relative to the grid and are [0,1]. Then how does the max suppression and the final visualization of boxes work without knowing which grid the box is from?

Alireza_Saei · January 30, 2025, 6:18pm

Hi @Jacky43805 ,

In YOLO, even though the grid information is not kept after masking and max suppression, the bounding box coordinates are first predicted relative to their grid cells. In processing, these relative coordinates are changed into absolute positions using the grid cell’s location. This makes sure the boxes stay in the right place in the image. Non-Max Suppression (NMS) works on these absolute coordinates, choosing the best boxes based on confidence scores and removing extra ones. So, even if the grid details are lost, the boxes are still placed correctly because of this step.

Hope it helps! Feel free to ask if you need further assistance.

Jacky43805 · February 1, 2025, 12:08am

Thank you for taking the time to reply. I really appreciate your insights. I was under the impression that, the NMS step compares the highest probability box with all overlapping boxes, which would mean it needs to compare boxes from different grids as well (if they overlap). I’d like to know if there’s something I might be missing here.

Jacky43805 · February 1, 2025, 12:15am

Thank you for your reply. I looked back into the assignment and found out this step is implemented in yolo_head function imported from document keras_yolo.py. I had the confusion because all demos are done with the normalized data but what is passed to NMS are already absolute values. Thanks again for your input!

Alireza_Saei · February 1, 2025, 12:40am

You’re welcome! happy to help. Feel free to reach out if you need further assistance.

Jacky43805 · February 2, 2025, 7:15pm

Hi, I just noticed that inside the yolo_head function they applied non-linearity to the box location and dimensions.
box_xy = K.sigmoid(feats[…, :2])
box_wh = K.exp(feats[…, 2:4])

where feats are reshaped yolo outputs
I wonder what is the intention here, and I am also confused that how this non-linearity leads to an accurate presentation of the locations (which should be linear).

ai_curious · February 2, 2025, 9:56pm

Lots of threads created about YOLO over the years. Here’s an example that goes into some depth about those specific expressions…

Topic		Replies	Views
Question about Week3 Yolo Programming assignment Convolutional Neural Networks week-module-3 , coursera-platform	3	265	April 19, 2024
[Deep Learning Specialization W3A1 - YOLO] How does NMS know which grid cell our filtered/masked scores and bounding boxes correspond to? AI Discussions ai-discussions	5	229	February 18, 2024
Week 3: Car Detection with Yolo Convolutional Neural Networks coursera-platform	4	714	September 22, 2021
Scaling images for the YOLO's Network Convolutional Neural Networks coursera-platform	2	561	June 27, 2021
YOLO non-max supression Convolutional Neural Networks coursera-platform	13	1256	May 6, 2023

Week 3 Autonomous_driving_application_Car_detection confusion about YOLO algorithm

Related topics