Non-max suppression v.s. Anchor Boxes

I’m trying to figure out the difference between non-max suppression here v.s. anchor boxes. In the lectures Prof. Ng said that the non-max suppression is repeated individually for each object class to eliminate overlapping bounding boxes.
Let’s say we have overlapping bounding boxes for different object class A and B where the IoU is greater than 0.6,

  1. What happens to the class B’s bounding box if non-max suppression is done first for class A? Does it mean that class B’s bounding box will be removed?
  2. If that’s not the case, then how does having additional anchor boxes make a difference?
  3. What happens when 2 different object classes share similar anchor box shapes? E.g. trees and lamp posts? Can anchor box work in this case?
1 Like

Does this help?

2 Likes

In addition to the thread that Balaji gave you, it’s also worth making the higher level conceptual point that anchor boxes and bounding boxes are not the same thing. NMS has to do only with bounding boxes. There are a number of great threads from ai_curious which explain YOLO in a more thorough way than the course materials. E.g. here’s one that’s relevant to your question. Here’s another that is also worth a look on this subject.

3 Likes

Some thoughts below. Hope it helps.

If I understand what you wrote, NMS is being run per class in your scenario. Therefore, when processing class A, the list of predicted bounding boxes with a predicted class A is evaluated and possibly pruned. Not clear how or why a predicted bounding box with a predicted class B would be considered, let alone removed. That would only happen when evaluating the list of predicted bounding boxes with class B, right? Otherwise, you’re not actually doing it per class. Did I misunderstand?

A) Remember that anchor boxes shapes are class agnostic; they are derived from the shapes of the ground truth object bounding box occurrences, regardless of class. A car in profile might be wider than tall, whereas a car end-on might be almost square. And a car close up is probably larger than a car far away. For a particular training set, you might therefore end up with four anchor box shapes all derived from the sizes and shapes of cars. Just depends on frequency and what anchor box shapes fall out of the unsupervised learning process (you can find some detailed threads on exactly how this works using forum search and my userid)

B) Multiple objects of the same shape and size close together can in general cause YOLO some trouble, especially if the anchor box shapes vary significantly. This has to do with the influence anchor box shapes have on training and bounding box prediction accuracy. This is also discussed in some detail elsewhere.

C) Finally, NMS cares about (predicted*) bounding box overlaps. At one extreme, IOU == 1., the pixels of the two bounding boxes are completely identical. You only need to keep one of the two. At the other extreme, IOU == 0., the two bounding boxes are disjoint, and you need to keep both. Notice that you don’t need to know anything about class to make either of these decisions. For 0. <= IOU <= 1., NMS uses a threshold parameter to decide whether to decide the two boxes contain the same object. Could be a small car in front of an SUV (ie two objects with a predicted class of car with same object center, differing sizes), or a predicted class of light post colocated with an object with predicted class of a tree, or even two predictions that are actually of the same object from the image. Note that IOU is sufficient to make a decision in each case; incorporating class isn’t guaranteed to help. Can using NMS class agnostic make mistakes that degrade performance? Yes - it might incorrectly suppress true positives. Can using NMS per class make mistakes that degrade performance? Yes - it might incorrectly retain false positives. Which is worse? It depends on your application. Train and evaluate carefully and use the option that performs the best in your operational environment.

*I write predicted at the beginning of this paragraph because that is what YOLO and the autonomous vehicle exercise invoke NMS with, but actually NMS doesn’t care what the bounding box list is comprised of or how it was created. You pass it a list, it determines if the list needs to be pruned to remove duplicates.

Cheers

2 Likes

Hi @balaji.ambresh thanks for sharing this. It will be very helpful for understanding the upcoming assignment!

1 Like

Hi @ai_curious thank you very much for sharing your knowledge (also thanks to @paulinpaloalto for all the incredible links).
With regards to your 1st reply:

  • If I understand you correctly, even though in a forward pass there are multiple classes being predicted from an input image (which is the reason why I thought there would be bounding boxes coming from multiple classes), the NMS suppression basically only looks at each class individually and ignores other classes and their associated bounding boxes for the pruning process?
1 Like

Sorry I wasn’t more clear. This is exactly what doesn’t happen. The TensorFlow implementation of NMS, and the one you build as an exercise, doesn’t know or care about class. It accepts a list of bounding boxes and returns a list of bounding boxes without regard to class. One could separate the output of the NN by class before invoking NMS, however, I am of the opinion that this defeats the purpose of duplicate removal. Others differ.

In any case, I suggest that class filtering should not be done by NMS algorithms themselves. It just makes the code more complex and more complex means more test cases and more error leakage. If you decide it’s necessary, separate your bounding boxes before invoking NMS. One helper function to separate lists on class, on helper function to prune duplicates from the lists. Oh, and one more helper function to disambiguate two bounding boxes with different predicted class but identical locations :man_facepalming:
.

2 Likes