I have a question that is not addressed in this older thread and the two references cited there.
Older thread on localization loss
It seems that if an object is predicted, there are more loss terms (4 + number of classes) where as if the object is not predicted, there is only one loss term. Would this not encourage a training outcome with lower recall where falsely predicting no-object may lead to lower loss if an object is predicted and bounding box is off, for example?
Notice that the original YOLO paper mentions that during training, mistakes in classification and location are ignored if there really is no object present or there is one but that detector location (grid cell plus anchor box) is not the one responsible for the object.
Note that the loss function only penalizes classification error if an object is present in that grid cell (hence the conditional class probability discussed earlier). It also only penalizes bounding box coordinate error if that predictor is “responsible” for the ground truth box…
Otherwise, you end up influencing the location and classification learning when you really shouldn’t be.