Why don't we merge augmentated data and original data

In programming Assignment, We train by replacing the training data with augmented data.

In my opinion,I think we need to merge the training data and the augmented data.

Am I Misunderstanding Data Augmentation?

Data Augmentation is a technique that can be used to artificially expand the size of a training set by creating modified data from the existing one.
For more information visit

1 Like

I thought the OP question was more or less ‘is the original training data set contained within the augmented data set?’ which I don’t see addressed by the first reply. Might be helpful to label the images, since it may not be immediately obvious what transformation has been applied and where, if at all, the original image is in that set. Just thinking out loud.

1 Like

Thank you for applying my question

“However, we can improve the performance of the model by augmenting the data we already have.”
I have a question about the above part of your answer.

Did you use the word “augment” to mean “correction by supplementing the shortcomings in the training data”?

1 Like

Thank you for applying me ai_curious

it helps me .

This isn’t an area I have studied, but in reading the literature and TF documentation I can find easily, it seems like there isn’t a ‘rule’ about whether it’s a merge or a replacement. Many of the built-in capabilities from Keras use random transformations, especially for image manipulation, and generate them on the fly. However, my understanding is that you have the option of saving these generated images, so nothing (other than storage space) would prevent you from creating a merged data set. My intuition is that you’re doing augmentation generally when you have fewer training examples than is ideal, so not clear why you would throw out the original data…if you had more than you needed, you wouldn’t have started down the augmentation path in the first place. HTH

1 Like

Hi @WJC,

You did understand the concept. It is exactly that as you described.

We know in deep learning the more data we have the better it is. So for example, if we have a data set of 150 images, that’s kind of a low number right ? So we perform data augmentation to increase the size of the data, and yes, ideally, we do include the newly generated images in our data set, to increase its size.

I’m not sure why the programming assignment is only using the augmented images. I’m sure there’s a reasonable explanation for it. Can you tell me which assignment it is so that I can take a look ?


1 Like

@Mubsi: The data augmentation case being discussed is in the Transfer Learning with MobilNet assignment. That’s C4 W2 A2.


Thank you for your reply as it really seems to work for me.

Personally, I think I need to study more while looking for a thesis in this field.

thank you mubsi

Any doubts I had have been resolved.


Hi @WJC,

After looking at the assignment:

We do not replace the augmented data, we pass the data through the augmentation layer which contains a random flip and a random rotation. That means that on some images you will get the original image and on others you will get the transformed one, which is like merging the original and augmented.

Hope this helps,