Does Data Augmentation apply only to train data?

mtorre · July 11, 2021, 4:15pm

From the Andrew’s videos, I understood that data augmentation must be applied only to the train set, not on the dev or test sets. is that correct?

I’ve been working with data augmentation before. After I created new data, I randomly selected train, dev and test data out of the new data pool, including augmented data. Is the right way to do it, or should I add augmented data to train set only, obtaining dev and test set out of the data without augmentation?

Please advise.

Thanks!

Mario

fabioantonini · July 12, 2021, 11:12am

Hi @mtorre
welcome to Discourse

The goal of data augmentation is to generalize the model and make it learn more details of the images, such that the during testing the model is able to apprehend the test data well. So, it is well practiced to use augmentation technique only for training sets.
In an other video of the Deep Learning specialization Andre Ng says:

“I’d encourage you to follow in this case is to make sure that the dev and
test sets come from the same distribution. …because you will be using the dev set to evaluate a lot of different models and trying really hard to improve
performance on the dev set. It’s nice if your dev set comes from
the same distribution as your test set.”

regards

mtorre · July 12, 2021, 12:16pm

Fabio,
Thank you for your response. I appreciate it.
Regards,
Mario

Topic		Replies	Views
Data augmentation for devset Convolutional Neural Networks in TensorFlow week-module-2	1	355	September 5, 2023
C2_W3_Video: Adding Data Advanced Learning Algorithms week-module-3	9	279	April 15, 2024
Why would we augment validation data Convolutional Neural Networks in TensorFlow week-module-2	1	504	July 16, 2022
Week1 quizz: very confused about train/dev/test set and when to add new data to which set Structuring Machine Learning Projects week-module-1 , coursera-platform	2	395	February 1, 2024
Adding Training data which distribution differs from Dev/Test sets Structuring Machine Learning Projects coursera-platform	16	967	December 9, 2024

Does Data Augmentation apply only to train data?

Related topics