As I understand in the Yolo assignment after running yolo_filter_boxes, all the different anchor boxes which are passing the filtering are mixed, and when running yolo_non_max_suppression, one anchor box can suppress a different anchor box, say if we have a bike and a car in the same grid cell, if the IoU of the two boxes are more than 0.5 (which most probably is), we will keep only one of the two objects. Am I right, or I am missing something?
There are lots of threads discussing YOLO, since it’s by far the most complex algorithm we’ve seen up to this point. Here’s a good one to start down the path of understanding anchor boxes and NMS. And here’s another one that is structured more like a presentation on the topic.
Thanks a lot!
@RezaOmrani, As you dig deeper into the ideas behind YOLO, pay close attention to ground truth bounding box, predicted bounding box, and anchor box. They each have a different meaning and role in YOLO and cannot be used interchangeably. In the original question above, anchor box is likely not the correct one of those three to be using. Let us know what you find!
EDIT - here’s my take on definitions for these important terms -
Ground truth bounding boxes are part of the training data and are not YOLO-specific. Any object detection algorithm run on this data set will use them. They are static, used at training time but not runtime, play no role in non_max_suppression
and thus not filtered or suppressed. They have shape and location. During training, ground truth bounding boxes are compared to predicted bounding boxes output by a forward propagation and the error between them is part of the cost function that drives learning.
Anchor boxes are a small set of shapes determined through exploratory analysis on the training data. As a group, they represent the set of shapes that minimize IOU error with the ground truth bounding boxes using K-means clustering. Anchor boxes have shape only - no location. They are static and fixed, determined prior to training time. For the autonomous driving class exercise the number of anchor boxes chosen is 5. Other small integers are sometimes used eg 3, 7, and 9. Anchor boxes play no role in non_max_suppression
and thus not filtered or suppressed.
Predicted bounding boxes are part of the set of object detection predictions produced by a forward propagation of the YOLO algorithm. (The other parts are the class and object present/absent predictions.) Predicted bounding boxes have shape and location. Due to the possibility of the algorithm producing multiple detections on the same image object, predicted bounding boxes are submitted to non_max_suppression
, which will attempt to remove duplicates through pairwise comparison using the Jaccard Index or IOU. For any set of predicted bounding boxes that are determined to be of the same image object, only the single prediction with the highest confidence will be retained; the non max predictions of that object will be subject to suppression. Nomen est omen.
Hope this helps