Data augmentation increases the size of the training set?

Hello, I have this question.

When using data augmentation on a training set of 2000 images, there will still be just 2000 images in the training set?

Or some copies of the images are created? Like, the original image is kept and also there is a new image that is flipped.

Thank you

Hello @Riccardo_Andreoni, welcome to the community :smiley:

As a matter of fact, the original 2000 images won’t be touched with any changes, as image augmentation doesn’t require you to edit your raw images. They will be loaded into memory,
and there, the augmentation operations will be performed on-the-fly while training using transforms.
As a result, you will have more than 2000 images for training without impacting your dataset.

You will gain new images by performing data augmentation. It is a powerful task to avoid overfitting since you expose your model to different types of structural data.

Thank you for the reply, it’s very clear. Just for confirmation, the model will learn on both the original pictures and also on the transformed ones?
If so, how can I control how many transformations to perform on a single image?
I mean, who tells Tensorflow to apply just a rotation on the image instead of N rotations plus some shears?
Thank you!

Is there any way to calculate de number of new images generated after augmentation?

1 Like

Same question: how many new examples are generated by augmentation? The output display of…) shows 100/100 when each epoch ends. So it appears that the number of batches is still 100, the same as when augmentation is not used. Either TensorFlow adds additional batches behind the scene that are not reflected in the 100 count, or it increases the batch size so each batch includes more than 20 examples. Which is the case?

This is also my question (I just opened a topic for this…)

How many images I have after augmentation?