I have a quick question about non-max suppression. Why not just pick the highest score directly from all the bounding boxes and then discard all the remaining ones? Why do we choose the more complex iterative procedure of picking the highest, discarding the high IOU boxes and so on. Thank you.
What is the total number of objects you could detect per image following the max score only approach? Is that aligned with the design objective of YOLO?
There is an elaboration of the answer within this thread on the implication of grid cells and anchor boxes in YOLO
1 Like