Question for the assignment: Autonomous_driving_application_Car_detection
If I understand correctly, the variables boxes, scores, and classes returned from yolo_filter_boxes are for all possible classes, i.e. boxes contains box coordinates of different classes, and scores contains scores of different classes, respectively. We use classes to know which class a box/score is for.
When we call tf.image.non_max_suppression( boxes, scores, … ) its parameters do not have classes, how can that function differentiates boxes of different classes ? We should suppress boxes of the same class only, right ?
Thanks for your excellent question.
The call to tf.image.non_max_suppression takes max_boxes_tensor as a parameter. This tensor is passed to max_output_size in tf.image.non_max_suppression.
The documentation at tf.image.non_max_suppression | TensorFlow Core v2.4.1 states the following:
max_output_size: A scalar integer Tensor representing the maximum number of boxes to be selected by non-max suppression.
This is irrespective of class. So, the total number of indexes of boxes for a picture returned by the call to tf.image.non_max_suppression is max_output_size regardless of which class they belong to. In the meantime, overlapping boxes are removed according to iou_threshold.
In other words, the call to tf.image.non_max_suppression does the following: it removes overlapping boxes and returns the max_boxes number of indexes of boxes irrespective of which class they belong to. There can be multiple objects of the same class! The classes of the boxes are next determined based on the returned indexes through tf.gather.
I hope this clarifies things.
Thank you very much for your explanation. That confirms my thought.
That brings up my next question: assuming that anchor 1 of one cell has a box with prob 0.9 for a car , and anchor 2 of the same cell has another box with prob 0.85 for a passenger, and assuming that these two boxes have their IOU 0.7. This may be the case for the example in the lecture. Now, if we run the non_max_suppression on all boxes of different classes as in the assignment, then the box in anchor2 will be suppressed, while it is a valid one.
I understand that the assignment is not a real application, and just want to know whether in reality, the correct solution should be to gather boxes of each class together and run NMS on each of those set of boxes separately. As in the assignment, it should run 80 times for 80 classes. Is that right ?
Again, thank you very much for your help.
Thanks for your reply.
Yes, an actual implementation of yolo predicts all bounding boxes across all classes (which in the case of the assignment would require doing what you suggest). Here’s a good overview of current systems with links to important articles that describe approaches taken:
Thank you very much Reinoud.