I’m not clear on what it is we think needs improvement. Here is how the TensorFlow doc starts the description of NMS: Prunes away boxes that have high intersection-over-union (IOU) overlap with previously selected boxes. Consider the case in the limit where IOU is 1. That means two detectors agree exactly on the object location: they produced the same predicted bounding box. As is, the class prediction is ignored and only the highest confidence prediction is retained. However, if each has a different class prediction and you are running NMS separately, then both predictions are kept. Notice only one of these predictions can be correct; we can’t have two different types of objects occupying exactly the same pixels of an image. But we can’t really call it non-max-suppression if we are not suppressing the inferior confidence predictions, right? To disambiguate these two predictions that have survived the pipeline, a further processing step will be required. I think it is a legitimate question whether NMS overall improves accuracy and precision, but NMS run on classes separately is an oxymoron.