Week 3 - Assignment 1 - Computation of Class Score: Why multiply Pc with C?

My understanding is that when ground truth data is established, p_c = 1 for the one grid cell + anchor box responsible for the object center and p_c = 0 for all others. It’s always a challenge talking about these things in part because the notation differs between the class materials and the papers. Redmon et al use Pr(object) for the object presence probability…they don’t use p_c. In the notebook markup it isn’t completely clear whether p_c is treated as Pr(object) or Pr(object) * IOU (b, object). The language is either ambiguous or, since there is no mention of IOU in these parts of the notebook, perhaps leans towards inferring it is Pr(object). I believe this interpretation is supported by the lectures and by these pieces in the exercise code…

def yolo_head():
    box_conf : tensor
        Probability estimate for whether each box contains any object.
    box_confidence = K.sigmoid(feats[..., 4:5])
    return box_confidence,...

def yolo_loss(...,rescore_confidence=False,...):
    rescore_confidence : bool, default=False
        If true then set confidence target to IOU of best predicted box with
        the closest matching ground truth box.
    pred_xy, pred_wh, pred_confidence, pred_class_prob = yolo_head(...) #NOTE: the return params are out of order in the version of keras_yolo.py I have from 2018

    no_objects_loss = no_object_weights * K.square(-pred_confidence)
    if rescore_confidence:
        objects_loss = (object_scale * detectors_mask * K.square(best_ious - pred_confidence))
        objects_loss = (object_scale * detectors_mask * K.square(1 - pred_confidence))  

none of which seems to me to align directly with this equation in the v2 paper

Pr(object) * IOU(b, object) = \sigma(t_o)

In our code, since box\_confidence = K.sigmoid(feats[..., 4:5]) and t_o is feats[...,4:5], then Pr(object) == box\_confidence == \sigma(t_o)

In summation, my read is that across the class lecture, notebook markup, and our version of the v2 darknet code circa 2018, p_c means Pr(object). Despite showing up in parts of both the v1 and v2 papers, I can’t find any support for p_c being Pr(object) * IOU(b, object) in our class materials. Rather, it is just treated as object presence probability (or confidence) unless restore_confidence=True in yolo_loss, in which case the interaction is still not multiplicative. Despite the ambiguity around whether the IOU of the predicted bounding box is included in p_c in this course material, I think we do all agree that the final class scores are the product of p_c and c_i. You can see that in the implementation

def yolo_filter_boxes(...):
    box_scores = box_confidence * box_class_probs
    box_class_scores = K.max(box_scores, axis=-1)

btw it looks like there is a cut and paste artifact /typo in my quote above from the original paper. The word pred was incorrectly left over after I deleted the copy/paste of equation (1) from the paper. Sometimes the Discourse UI on the iPad gets wonky when using emphasis fonts and LaTeX, but in any case apparently I didn’t proof well. My bad.

Welcome suggestions for clarification/correction