Hey @mahsa_zarei,
Adding more low-quality cat images in the training dataset is one of the ways but not the only way, and hence, Prof. Andrew says that we need to change the dev/test set. Let’s understand it more clearly.
If a model is performing well on the dev/test sets but is failing to perform well on unseen production data, our ultimate goal is not met. This means that the dev/test sets do not represent the data distribution of unseen data.
And this is why, we need to change the dev/test sets. Now, once we will change the dev/test sets, we will find that the model doesn’t perform well on the dev/test sets, and hence, we will further develop the model at this stage, instead of sending the model into production. In this case, the dev/test sets represent the data distribution of unseen data to a greater extent, and hence, are better datasets for estimating the model’s real-world performance.
Now, once you will know that your model doesn’t perform well on the dev/test sets, you will try various different methods to rectify it, one of which is to include low-quality images in the training set. However, you can try various other methods such as augmentation techniques for the existing images to bring down their resolution, simple down-sampling techniques, etc.
So, you see, what you are saying is indeed one of the next steps of what Prof. Andrew is saying. In fact, even if you miss out on improving the dev/test sets, and just modify the training set, then also, you will have to improve the dev/test sets at some point of time, since only dev/test sets having a similar data distribution as the unseen data, are true estimators for your model’s real-world performance.
I hope I have made this clear as to why Prof. Andrew says this, and I hope this helps.
Regards,
Elemento