Hello everybody,
I am learning coding with tensor flow CNNs and i have some follow-up questions to train generator and batch size that were taught in first week of 2nd course of TF specialization.
Let me describe following scenarios:
Scenario 1
- i have binary classification Cat/Dog
- have only train folder in train folder i have subfolders “Cat” and “Dog”
- each folder has 1000images (totally cats+dogs = 2000 images)
having 2000 images in train folder, i code train generator to have batch size = 20
code train_generator = train_datagen.flow_from_directory(
batch_size=20 )
it means that train generator is loading 20 images per batch and as a result i have to set parameter steps_per_epoch in fit mehod accordingly to run through all data, correct?
it means that there must be 100 steps per epoch to run through all 2000 images (100 * 20), am i right?
model.fit(steps_per_epoch = 100)
Prediction:
then lets say i will do prediction, data fed to train generator are loaded based on alphabetical order, so it means that if model predicts 0 than it is a a cat, is it correct ?
Scenario 2: Imbalanced Data
let’s say that cat/dog data are imbalanced
- in subfolder Cat there are 1000 images,
- in subfolder Dog there are only 500 images.
- Total number of images:1500
Having totally 1500 images, i set batch size: 15 and steps per epochs 100 to run through all examples, but having imbalanced data, if i am loading 15 images per batch to model, are those imbalanced cat/dog images loaded based on % portion of each?
if cats are 1000 images and dogs are 500 images, is it loaded in batches by generator like 10 cats images and 5 dogs images?
Or in case of imbalanced data, should i always made them balanced to have same number of images in both subfolders (e.g. get more data etc.) ? Could you tell me please what is “best practice” with this regard?
i know that those are kind of rookie questions but i am trying to understand how train generator and model is behaving with different data.
Thank you very much for answer
Best Regards
Filip