Let’s say I have 100 training images. If I apply data augmentation (for example, a horizontal flip), should I store the newly augmented set of 100 images alongside the original 100 images, resulting in a total of 200 images on my disk to train my model? Or should I train my model using the original 100 images while applying data augmentation on the fly during the training process? Thank you in advance!
Hi @jakhon77
It depends on your use case and resources:
-
Dynamic data augmentation: This method saves disk space and provides diverse augmented images in every epoch.
-
Pre-augmented storage: Useful if you need a fixed dataset for reproducibility or other reasons.
For most scenarios, on-the-fly augmentation is better.
If I apply data augmentation during training (on the fly), the model will only see the augmented versions of the images and not the original ones. Is that still good?
Why do you think your model won’t see the original data? When applying on-the-fly data augmentation, the model sees both original and augmented versions of the images over multiple epochs because augmentations are applied randomly at runtime.
Could you clarify your concern?
I understand now. Since augmentation is applied randomly, it may or may not occur. Thank you very much for your clarification! You explained it very well!
Exactly. You’re welcome, happy to helps
Yes, that is the easiest method.
If/when you set a seed, as long as you’re using the same random number generation engine (say, the same version of numpy/tensorflow/pytorch) the augmented images will be exactly the same. As @TMosh mentioned you may choose to store the original + augmented images for future use. However, if you change the seed you’re likely to get a different set of augmented images - that’s when @Alireza_Saei’s suggestion is most relevant.
Factors to consider: disk space, RAM/GPU memory, total run time for augmentation, setting a fixed seed vs not setting a seed (not recommended if you prefer reproducibility)
Since 100 is a small number it’s ok to do augmentation on the fly.