Why create two generators instead of one?

Any benefit of creating two separate image generators for rescaling here?

Instead of -

All images will be rescaled by 1./255.

train_datagen = ImageDataGenerator( rescale = 1.0/255. )
test_datagen = ImageDataGenerator( rescale = 1.0/255. )

--------------------

Flow training images in batches of 20 using train_datagen generator

--------------------

train_generator = train_datagen.flow_from_directory(train_dir,
batch_size=20,
class_mode=‘binary’,
target_size=(150, 150))

--------------------

Flow validation images in batches of 20 using test_datagen generator

--------------------

validation_generator = test_datagen.flow_from_directory(validation_dir,
batch_size=20,
class_mode = ‘binary’,
target_size = (150, 150))

Can we do this -

img_gen = ImageDataGenerator(rescale=1/255)

train_gen = img_gen.flow_from_directory(‘/Users/sailsabnis/Downloads/coursera/cats_and_dogs_filtered/train’,
batch_size=20,
class_mode=‘binary’,
target_size=(150,150))
validation_gen = img_gen.flow_from_directory(‘/Users/sailsabnis/Downloads/coursera/cats_and_dogs_filtered/validation/’,
batch_size=20,
class_mode=‘binary’,
target_size=(150,150))

Did you try, could you do it. If not perhaps there is something left in the memory for each generator that does allow its reuse as fresh component!

Hi there!

Yes, I did try and it works smoothly. No need to create two datagens (test and train). we can simply create 1 datagen say ‘img_gen’ and use it for both train and test generators.

Although, my question was why did Laurence chose to have it as 2 separate dategens. Any additional benefit we get out of that down the road?

1 Like

No I dont think there is an additional benefit rather then making it more clear for the learner!

1 Like

It may work in this particular instance, but it could also be subtly wrong. Maybe you end up doing training on the validation data. How could you tell if that is what is happening? We’re in the deep end of the swimming pool here doing sophisticated Object Oriented Programming. To get the correct results, you need to be very clear how the underlying TF classes are defined. When you get an instance of the ImageDataGenerator class, then you invoke the flow_from_directory() method of that class on that instantiated instance, does it generate yet another separate object or does it just invoke one existing object both times? You need to be sure that is true in order for your version to work if your usage of those objects will be interleaved. I have not taken these deeper TF courses, just learned TF through DLS C2, C4 and C5, so I have not encountered that class definition before. Maybe Laurence is just trying to set a good “style” example so that you get in the habit of writing code that is more likely to work in the general case and to protect you from harm in the case that you have not studied the definitions of the classes you are using with sufficient care. Or maybe he really knows that this is necessary in this particular case and your code is subtly wrong, but you don’t have a way to tell.

1 Like

Thanks Paulin for taking the time out.

I don’t think we end up doing training on the validation data.

All it does is initialise one single instance (img_gen) of ImageDataGenerator which is normalising whatever is fed to it between 0 and 1.

Then,
train_gen = img_gen.flow_from_directory(‘train_data_path’…)
&
validation_gen= img_gen.flow_from_directory(‘validation_data_path’…)

both these above lines of code will dictate which data we are feeding (train vs validation)

I agree it is more of a best practise to write clean code. Thank you!