Non-max supression Clarification

Hi Sir,

@thearkamitra
@arosacastillo
@AmmarMohanna
@XpRienzo
@reinoudbosch
@chrismoroney39
@paulinpaloalto

  1. Why multiple detection happening first, it should not occur right because im saying according to YOLO bounding box detection method, midpoint of the object assigned to one of the grid cell then we can drawn bounding box around that relative to the grid cell right ? IF so means why algorithm detects multiple bounding boxes ?

  2. Non max supression can easily detect one of the bounding box with high probability among all other boxes , if so then why we discard boxes with high LOU?

Correct that you assign one and only one location when establishing training data. But runtime predictions are not working off training data, and even if they were, predictions are never 100% accurate. Which means more than one output location of the network (grid cell plus anchor box tuple) can make a prediction involving the same object.

Given the possibility (above) of more than one prediction on the same object, the purpose of NMS is to remove likely duplicates. IOU is a measure of region similarity - both location and shape. The higher the IOU of two boxes passed to NMS, the higher the likelihood they are predictions on the same object in the image. Two different objects likely would have a different shape even if the center locations were identical (think person standing in front of a car) and thus a low IOU. It is counterintuitive but in this part of the algorithm low IOU means keep, high IOU means throw away.

It is not enough to rely only on confidence, since then you would keep only the person or the car, which ever had highest p_c. If the confidence is above the threshold and the IOU is low, keep both.