Does Data Augmentation apply only to train data?

From the Andrew’s videos, I understood that data augmentation must be applied only to the train set, not on the dev or test sets. is that correct?

I’ve been working with data augmentation before. After I created new data, I randomly selected train, dev and test data out of the new data pool, including augmented data. Is the right way to do it, or should I add augmented data to train set only, obtaining dev and test set out of the data without augmentation?

Please advise.



Hi @mtorre
welcome to Discourse

The goal of data augmentation is to generalize the model and make it learn more details of the images, such that the during testing the model is able to apprehend the test data well. So, it is well practiced to use augmentation technique only for training sets.
In an other video of the Deep Learning specialization Andre Ng says:

“I’d encourage you to follow in this case is to make sure that the dev and
test sets come from the same distribution. …because you will be using the dev set to evaluate a lot of different models and trying really hard to improve
performance on the dev set. It’s nice if your dev set comes from
the same distribution as your test set.”


1 Like

Thank you for your response. I appreciate it.