How can YOLO compute the pc confidence score at test time/inference, given that it is calculated as:
IOU between predicted bounding box and ground truth bounding box.
But, during test time, you don’t have the ground truth boxes. How is it possible?
[Someone asked this on Quora but don’t understand the single answer given]
The test set is also labeled data, so you have the ground truth.
Thanks for the answer, yes that make sense for the dev set. But by test I am referring to new images that we want to detect objects in and that we have no labels for, i.e., test set not being the dev set.
That’s the whole point of supervised learning. You use a labeled training set to learn how to make predictions on new data that doesn’t have labels.
In the case that you are actually applying YOLO to new data where you don’t have labels, then you can’t calculate your prediction accuracy. Your self-driving car either crashes into a pedestrian or a tree or it doesn’t. That’s how you know whether YOLO worked in “real life” …
But just from a pure terminology point of view, that’s not what we call “test time”. As @Tmosh says, for all the phases we call training, cross validation (or “dev”) and “test” we have labelled data.
That’s not “test”. That’s “using the model to make predictions”.
Thanks both! It’s my own misunderstanding that I assumed the predicted confidence score from the model = P (object) * the actual IOU between the predicted bounding box and ground truth bounding box.
It didn’t make sense to me at first that a predicted value depends on the ground truth value to be calculated, but now I understand it’s not the actual IOU but a predicted (P (object)* IOU) as a whole.
It may help to know that this question seems directed at 2 specific sentences in section 2 of the original YOLO paper:
“ At test time we multiply the conditional class probabilities and the individual box confidence predictions,
Pr(Classi|Object) ∗ Pr(Object) ∗ IOUtruth = Pr(Classi) ∗ IOUtruth (1)
which gives us class-specific confidence scores for each box. These scores encode both the probability of that class appearing in the box and how well the predicted box fits the object.”
If so, an important clue about when this might occur is the last clause how well the predicted box fits the object. That comparison can only take place if you have something to compare with, namely the ground truth bounding box. What isn’t clear to me from the paper is where this evaluation is performed. Since it can’t be part of the core forward propagation pipeline, it seems like it might be some post processing done to assess accuracy during dev and/or test. You’d have to look at the YOLO v1 code base to confirm.