W3: Question with non max suppression and classes

I have a maybe naive question about non max suppression when there are multiple classes.

When we do non max suppression, should we temporarily ignore the fact that different boxes belong to different classes and perform non max suppression only once, or should we do that for each class, for example, for each class, only pick out those boxes that belong to that class and apply non max suppression, then gather all those boxes with different class together? In other words, if we have two boxes whose IOU is bigger than the threshold, but they belong to different classes, should we keep them all, or just leave the one with higher probability?

I think the latter one is what the lecture is recommending, but the code in W3A1, UNQ_C3 seems to ignore the effect of class. Thanks!

1 Like

Hi Tianle,
Am curious about your question and need further details. Could you please paste the code of W3A1, UNQ_C3
Non-max suppression should be for done individually for each class, Or in fact each object detected.

  • First step is ignore all boxes with Probability less than a threshold.
  • Then, For overlapping boxes, if there is IOU more than threshold, ignore the one with lesses probability. ( The reason here is that both boxes are predicting the same object)
  • What remains are the actual objects detected and their bounding boxes.

Could you please paste the code where it is ignoring this… I had similar confusion when doing this exercise.

1 Like

@vsnupoudel Here you go:

# UNQ_C3 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: yolo_non_max_suppression

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes
    
    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes()
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
    classes -- tensor of shape (None,), output of yolo_filter_boxes()
    max_boxes -- integer, maximum number of predicted boxes you'd like
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None, ), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """
    
    max_boxes_tensor = tf.Variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()

    ### START CODE HERE
    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ##(≈ 1 line)
    nms_indices = None
    
    # Use tf.gather() to select only nms_indices from scores, boxes and classes
    ##(≈ 3 lines)
    scores = None
    boxes = None
    classes = None
    ### END CODE HERE

    
    return scores, boxes, classes

It is not class based from what I understood from my research and the full notebook code.

1 Like

More than you ever wanted to know about YOLO and NMS.

NMS prunes duplicates
Non-max suppression is used in YOLO to suppress predictions likely to be of the same object. It ignores the class predictions and uses a Jaccard Similarity Coefficient, aka IOU, and uses only the two predicted center locations and shapes transformed into bounding boxes.

If the IOU between two predicted bounding boxes is sufficiently high then they are treated as if they are same object and only one of the two, the one with the highest confidence, will be retained.

Here’s why…

Why duplicate predictions might occur
Multiple predictions of the same object can occur both from multiple anchor boxes in the same grid cell or from predictions in neighboring grid cells that both position the object center within their grid (NOTE: one of these is incorrect - the actual center can only be in one grid cell at a time).

Why use IOU alone for duplicate detection?
First, consider the case of low IOU. This means either the locations are different, or the shapes are different, or both. If the locations and shapes are disjoint, the IOU is 0, they must be different objects, regardless of class, so keep them both. If the locations are similar but the shapes are sufficiently different that IOU is low, they must be different objects, so keep them both. Again, regardless of class. This is what enables YOLO to detect a Person standing in front a Car, for example. Same center location prediction, different shapes. Only if the location and shape are both sufficiently similar that IOU is high, assume they are duplicates and keep only the one with the highest confidence even if the predicted class of the two objects is different. In the limit that IOU is 1, the location in the image is identical and the shape is also identical; the prediction is that they share the same pixels. In this case, they are in effect superimposed on each other, so one is occluded; even if the class predictions are different, keep only the one with the highest confidence. The simplifying assumption is that if two bounding boxes contain the same (or almost the same) pixels, they must enclose the same object.

Doesn’t this make some mistakes?
Yes, but YOLO is optimized on frame rate throughput. Small degradation of accuracy is acceptable in the name of speed. Its an engineering tradeoff, where this approach was deemed to have more benefit (pruning of true duplicates) than cost (false positive duplicate designation). All the thresholds are parameters that can be tuned empirically based on a confusion matrix. HTH

So in nutshell, the programming exercise does not carry out NMS separately for each class, though it was suggested in the video lecture.

And you are suggesting that is okay because the probability of IoU > threshold for different classes is very low.

.

Correct

I think I last watched the video 5 years ago, so I really can’t say. But the question recurs periodically so I suppose this is correct also

The YOLO algorithm produces thousands of predictions each forward pass. It is possible that some of the predicted bounding boxes overlap. NMS with IOU is one way to try to remove false positives, that is two separate predictions that are actually the same object in the image.

Suppose two bounding box predictions are identical in both location and shape, meaning they occupy exactly the same pixels in the image. This could happen if the center of an object is near a grid cell boundary and two distinct detectors (grid cell + anchor box) each think they are responsible for making the prediction.in this thought experiment there is only one object in the image, but two predicted bounding boxes, and their IOU == 1.0

In this case you only want one prediction to survive for downstream processing, and you can achieve this by keeping only the highest confidence prediction regardless of what the respective class predictions are.

At the other end of the scale, suppose there are two objects in the image and the IOU of their predicted bounding boxes is very low. In this case you want NMS to keep both predictions. Again, you can make this decision based only on predicted bounding box location and shape and regardless of their class predictions.

The problem that I see with running NMS separately for all classes is that afterwards you can still have two predictions in exactly the same location and exactly the same shape, but now no way to rule one out.

A point I have tried to make in this and other related threads is that this stuff is engineering applied to achieve specific business outcomes.There isn’t one rule that you can just always apply that is always right. Use the technology that fits the operational situation and needed results. For the YOLO inventors, non-class NMS addressed their needs. I have never experienced a suitably trained YOLO model that got better performance running NMS per class. Your mileage may vary.

this was helpful