ImageDataGenerator is very effective generating new images by using augmentation on fly (in memory).
I can get the nunber of images I can use for training (or validation) using some thing like:
#len(train_data.filepath)
But… if I use augmentation, can I get the lenght of “augmented dataset”? I think this is very usefull in order to get a full control of the training process.
From a usage perspective, it’s better for you to set the transformation parameters and let ImageDataGenerator generate upto and including NUM_IMAGES * NUM_EPOCHS number of unique images.
Data augmentation takes place on the CPU when using ImageDataGenerator. If you’d like to control the randomness, flow_from_directory has a seed parameter you can set before running the transformation. Here’s an example of iterating over augmented images:
for epoch in range(NUM_EPOCHS):
for batch_index, (augmented_images_batch, labels_batch) in enumerate(train_generator):
# do work with augmented images
if batch_index + 1 == len(train_generator):
break
# end of epoch