C3W3 - Intersection over Union

The point is that we don’t “let” the algorithm draw unwanted boxes and we don’t know the ideal bounding boxes a priori in “prediction” mode. It’s only in training that we have the labelled data, right? We just train the algorithm and it does what it does. If we don’t like the results, then we need to retrain it with either more and better data or better hyperparameter choices or both. I have not studied the YOLO paper(s), so don’t know the full details of how they arrived at this algorithm, but they apparently realized that it frequently detects the same object multiple times with slightly different bounding boxes. Adding this “non-max suppression” step by using IoU is just a computationally inexpensive way to refine the outputs and get better results.

There are some excellent threads on the forum from the past few years that go deeper into various aspects of YOLO than what we see in the lectures and the assignments. Here’s one that discusses non-max suppression. Please have a look and I hope that will shed more light on this question.

1 Like