Does Data Augmentation Increase Training Set Size?

tuttoaposto · August 23, 2022, 12:30am

In week 2 lab: A journey through Data, I noticed the training set remained the same size after DA. But I thought augmentation means to augment, a.k.a. to increase the sample size by way of generating different flavors of the original images and therefore making the algorithm generalize better.

This post mentions that DA doesn’t change the training set size but rather it only allows to present a different version of the image during each training epoch.

So can I say that DA doesn’t actually increase the training set size as each epoch can’t have more than one transformation of the same image?

gent.spah · August 23, 2022, 8:18am

It doesn’t change the origin folder size and input data but creates augmentations on the run.

George_Petropoulos · August 12, 2023, 2:05pm

Let’s take an example.
Say your training set is a set of 100 pictures.
You apply DA on all 100, flipping them 90 degrees as one transformation, and let’s say to grayscale (for a lack of a better example). That means, you apply 2 transformations, for each of 100 images, therefore your model reads 100 standard pictures, 100 flipped, and 100 grayscaled ones, for a total of 300.
For simplicity of the example, an epoch would have read 300 samples to adjust weights etc., while the training set on your hard drive let’s say, remains 100.
Essentially, your initial data does not change in volume, rather the model is given the chance to learn from the same image using different lenses (i.e., learning different features).

An example usage on our example, is that maybe you want your model to recognize faces, even if they’re flipped or have an angle on an image. If you do not apply any flipping, while all your 100 pictures are on the same angle, then your model has never seen a face whose angle is other than the 100 images. So, you would run into misleading performance, overtraining, in other words, not generalizing the objective you want your model to perform on.

Hope that helps,

Topic		Replies	Views
Data augmentation increases the size of the training set? Convolutional Neural Networks in TensorFlow week-module-2	6	585	March 25, 2023
About data_ The problem of the augmenter Convolutional Neural Networks coursera-platform	1	500	June 26, 2022
C2_W2_DataAugmentation_ImageDataGenerator Convolutional Neural Networks in TensorFlow week-module-2	5	507	March 19, 2023
Data augmentation - How that works for real? Convolutional Neural Networks coursera-platform	1	517	June 2, 2022
Transfer Learning and DA on my own set of images - Course 4, Week 2 Convolutional Neural Networks coursera-platform	5	545	July 10, 2022

Does Data Augmentation Increase Training Set Size?

Related topics