Evaluation Metric

Hi Sir,

In this lecture When to Change Dev/Test Sets and Metrics? ,at 3:06 time, we unable to understand the below statement. can u please help to understand please ?

The problem with this evaluation metric is that they treat pornographic and non-pornographic images equally

What is the problem with evaluation metric and what does it mean, the evaluation metrics treats equally ?

Hi @Anbu,

From what I understand, think of it as, you have a dataset, from which you want to recognise cats. But the dataset, among other images, also has pornographic images in it as well.

Now, Algo A has 3% error, which means it is classifying true cats images with 97% accuracy, but in that 3% error it has, it is classifying porn images as cat images.

Algo B on the other hand, while it classifies true cat images with 95% accuracy, however, that 5% error it has, it is not classifying any porn images as cat images.

As Andrew mentioned, given this information, we know that we want to use Algo B because we don’t want to show the customer any porn images, even though it has higher error than Algo A.

As you can understand, showing porn images is completely unacceptable. So how could this have been avoided while training your model ?

Now, coming to your question, “treating both images equally”…in the formula which Andrew first wrote (before adding the weight “w” term), it is treating both the images equally. What it means is that whenever the algo misclassifies the porn image (labelling actual porn image as non-porn and labelling actual non-porn image as porn), the error count is 1. But we know having porn images is totally unacceptable. So we want our evaluation metric to penalise heavily whenever an actual porn image is misclassified so that the model can learn better. This is why Andrew later added a weight term “w” in the formula to accommodate this.

Hope this helps.
Mubsi