My understanding is that when ground truth data is established, p_c = 1 for the one grid cell + anchor box responsible for the object center and p_c = 0 for all others. It’s always a challenge talking about these things in part because the notation differs between the class materials and the papers. Redmon et al use Pr(object) for the object presence probability…they don’t use p_c. In the notebook markup it isn’t completely clear whether p_c is treated as Pr(object) or Pr(object) * IOU (b, object). The language is either ambiguous or, since there is no mention of IOU in these parts of the notebook, perhaps leans towards inferring it is Pr(object). I believe this interpretation is supported by the lectures and by these pieces in the exercise code…
def yolo_head():
box_conf : tensor
Probability estimate for whether each box contains any object.
...
box_confidence = K.sigmoid(feats[..., 4:5])
...
return box_confidence,...
def yolo_loss(...,rescore_confidence=False,...):
rescore_confidence : bool, default=False
If true then set confidence target to IOU of best predicted box with
the closest matching ground truth box.
...
pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(...) #NOTE: the return params are out of order in the version of keras_yolo.py I have from 2018
no_objects_loss = no_object_weights * K.square(-pred_confidence)
if rescore_confidence:
objects_loss = (object_scale * detectors_mask * K.square(best_ious - pred_confidence))
else:
objects_loss = (object_scale * detectors_mask * K.square(1 - pred_confidence))
none of which seems to me to align directly with this equation in the v2 paper
Pr(object) * IOU(b, object) = \sigma(t_o)
In our code, since box\_confidence = K.sigmoid(feats[..., 4:5]) and t_o is feats[...,4:5], then Pr(object) == box\_confidence == \sigma(t_o)
In summation, my read is that across the class lecture, notebook markup, and our version of the v2 darknet code circa 2018, p_c means Pr(object). Despite showing up in parts of both the v1 and v2 papers, I can’t find any support for p_c being Pr(object) * IOU(b, object) in our class materials. Rather, it is just treated as object presence probability (or confidence) unless restore_confidence=True
in yolo_loss, in which case the interaction is still not multiplicative. Despite the ambiguity around whether the IOU of the predicted bounding box is included in p_c in this course material, I think we do all agree that the final class scores are the product of p_c and c_i. You can see that in the implementation
def yolo_filter_boxes(...):
box_scores = box_confidence * box_class_probs
...
box_class_scores = K.max(box_scores, axis=-1)
...
btw it looks like there is a cut and paste artifact /typo in my quote above from the original paper. The word pred was incorrectly left over after I deleted the copy/paste of equation (1) from the paper. Sometimes the Discourse UI on the iPad gets wonky when using emphasis fonts and LaTeX, but in any case apparently I didn’t proof well. My bad.
Welcome suggestions for clarification/correction