Change dev/test set scenarios, course3, week1

When prof. Ng is talking about when to change our target about evaluation target, He referred to two cases:

  1. if the target (evaluation metric and test/dev set) is not giving correct rank order based on preferences
    Or 2. when the model fails to give a good evaluation metric result on real user unseen data like blurry, low-resolution cat images on the cat classifier model for example.

Then he adds that we should change the metric and/or the test/dev set. I need more information about the last part. What does it mean to change the test set? If we made the entire model on high-res internet cat images and tested it on dev and tests sets. By changing the test among the existing high-res data, it will not perform better on blurry cat images. I think we need to feed more low-quality cat images to the data and then do the splitting. Am I right?

Yes I think you are correct! You need to update the data according to the cases where the model fails
Hope this helps
Thanks and Regards,
Mayank Ghogale

Hey @mahsa_zarei,
Adding more low-quality cat images in the training dataset is one of the ways but not the only way, and hence, Prof. Andrew says that we need to change the dev/test set. Let’s understand it more clearly.

If a model is performing well on the dev/test sets but is failing to perform well on unseen production data, our ultimate goal is not met. This means that the dev/test sets do not represent the data distribution of unseen data.

And this is why, we need to change the dev/test sets. Now, once we will change the dev/test sets, we will find that the model doesn’t perform well on the dev/test sets, and hence, we will further develop the model at this stage, instead of sending the model into production. In this case, the dev/test sets represent the data distribution of unseen data to a greater extent, and hence, are better datasets for estimating the model’s real-world performance.

Now, once you will know that your model doesn’t perform well on the dev/test sets, you will try various different methods to rectify it, one of which is to include low-quality images in the training set. However, you can try various other methods such as augmentation techniques for the existing images to bring down their resolution, simple down-sampling techniques, etc.

So, you see, what you are saying is indeed one of the next steps of what Prof. Andrew is saying. In fact, even if you miss out on improving the dev/test sets, and just modify the training set, then also, you will have to improve the dev/test sets at some point of time, since only dev/test sets having a similar data distribution as the unseen data, are true estimators for your model’s real-world performance.

I hope I have made this clear as to why Prof. Andrew says this, and I hope this helps.