Course 4 Week 2: Data Augmentation (live vs increasing the dataset)

pmmaung · November 4, 2021, 2:09am

I would like to know the difference between increasing number of images in the dataset via data augmentation and live augmentation.
Will increasing the number of images be better if we have a bigger network with more compute?
Is there any study when to use what kind of data augmentation (increase the dataset or live agumentation)?
Thank you.

paulinpaloalto · November 4, 2021, 3:01am

Data augmentation is a set of techniques you can use to increase your training set that don’t require you to acquire more new input images: you can modify the images you already have to make them useful for training. Acquiring new images can also be useful, but is sometimes more expensive or difficult than data augmentation techniques. Then you can apply the augmentation techniques to your new images as well. In other words, the two ideas are essentially independent and complementary.

pmmaung · November 4, 2021, 4:18am

Thank you for your explanation. What I meant by increasing the dataset in through data augmentation (not acquiring more new images). Let me put it this way to be more concrete.

I have 50, 000 images and I have three augmentation techniques.

I apply each augmentation technique and produce 150, 000 images, and train a network with 150,000 images, or
I apply three augmentation techniques (live). I train the network with 50,000 images with three augmentation techniques.

I just want to know some insights on how to use data augmentation for the best model in terms of implementation.

paulinpaloalto · November 4, 2021, 4:54am

Sorry, I don’t understand what you mean by your option 2). What is “live” augmentation? If you mean that you just create the augmented images “on the fly”, but don’t save them, then you are still training the network on 150,000 images, right? It’s just that you haven’t saved them statically in the training set.

Actually don’t you have 200,000 images total? The original 50k, plus the 3 variations produced by the augmentation techniques?

But also please note that augmentation techniques do not necessarily produce one-for-one output: if your augmentation is to incrementally rotate the images by a random angle, you could elect to use 3 or 11 or 42 such random angles per input image instead of just 1. It is a choice that you make.

pmmaung · November 4, 2021, 5:18am

Thank you. Yes. You are right for option 1, when we include original images, we train the network with 200, 000 images.

For option 2, what I have seen from current implementations (e.g., CIFAR-10) is that they only train the network with augmented images (not increasing the number of images), so, 50,000 remains the same. The images are augmented on the fly while loading the batches.

All classifications I have seen so far use augmentation as pre-processing (not increasing the dataset). I wonder if we increase the dataset, will it be better for a bigger model than preprocessing (live augmentation).

Topic		Replies	Views
W2 Data Augmentation Convolutional Neural Networks week-2	7	16	November 25, 2024
Data augmentation increases the size of the training set? Convolutional Neural Networks in TensorFlow week-2	6	574	March 25, 2023
Data augmentation - How that works for real? Convolutional Neural Networks	1	516	June 2, 2022
Image augmentation Convolutional Neural Networks	5	513	August 22, 2022
Data augmentation using tf.data AI Discussions	15	109	February 1, 2023

Course 4 Week 2: Data Augmentation (live vs increasing the dataset)

Related topics