I cannot understand the function “yolo_filter_boxes” in the assignment. How are the shapes assigned and what is the meaning of the mask at the end?? Could someone explain it in steps…??
The output shape of the neural network in this exercise is (19,19,5,85). It has been broken down into three separate Python objects:
(19,19,5,1) for the predicted confidence an object is detected.
(19,19,5,4) for the predicted center location coordinates and bounding box shape.
(19,19,5,80) for the object class predictions.
This function works on each of the three separately but keeps them aligned, or synchronized.
One step is to compute a weighted class prediction corresponding to p_c * c_i in the notebook narrative. This is called box_scores
.
Next, find the index of the highest weighted class prediction in box_scores
.
In order to suppress low confidence predictions, you use Python broadcasting to create a boolean mask that compares the values in box_scores
to a scalar threshold
. If the highest valued class prediction is lower than threshold
mark the prediction False
, otherwise True
.
Apply the resulting boolean mask to each of the three multi-dimensional objects - the deconstructed CNN output. This results in three new multi-dimensional objects. The shape depends on the input shape (whether it was the scores, the coordinates, or the classes) and how many of each survived the threshold filtering.