CNN week3 exercise 1 tf.image.non_max_suppression

Hi,

In week 3 exercise 1 (Autonomous_driving_application_Car_detection) we are told to use tf.image.non_max_suppression and that you have to specify that the max_output_size equals max_boxes.

If I understand correctly, this function uses non_max_suppression to remove the boxes with large overlap, and then the max_output_size parameter specifies the maximum number of boxes to keep after applying non-maximum suppression?

But why do we want to reduce the output with max_output_size and not just let it give us all the boxes? Because we already filtered out the boxes with a low score (so the CNN is confident in these boxes), and we filtered out with non_max_suppression those boxes with high overlap.

So aren’t we removing now boxes that are non-overlapping and with high confidence of an object in it? Or am I misunderstanding something?

Thank you in advance

I believe your understanding is correct, but the point is that you get to select the appropriate value for that parameter, based on the types of images you are handling. My understanding is that the NMS operation is performed in a loop over the individual grid cells. NMS gives you the best quality prediction for each individual object that the model has detected in that particular grid cell. So given the sizes of your grid cells and the resolution of your images, what is a reasonable limit on the maximum number of objects that you might see in a single grid cell? They’ve used 10 as the default value in the definition of the function, but you may need a higher value for your application. That’s up to you based on testing your model and observing its behavior.

1 Like