Hey @ai_curious
One topic that the lectures do not really touch upon is transforming raw output from YOLO to get final predictions. Or, in our programming assignment, basically whatever goes on inside the yolo_head function present in yad2k.models.keras_yolo file
The bounding box xy coordinates go through sigmoid activation + some offsetting and it’s height and width are scaled exponentially and by anchor height and width. Any insights / intuition from anyone on what exactly is going on here?
Was trying to read blogs online and here’s what one of them mentions…
Would be really helpful if someone can give more intuition around anchor boxes, I have a feeling that there’s a lot more that we need to grasp here…
Another thing, just looked at the formula for the localization loss in YOLO. If it’s computed as the sum of the squared error difference between the ground truth boxes and predicted boundary boxes, then what role exactly does anchor boxes have to play. All I can grasp is that they are there to just hold multiple classes… getting very confused now with these anchor boxes!