In C2_W2 we use ‘steps_per_epoch’ parameter while training the model. I would like to know why this parameter is used for this use case. How this parameter helps with the training? When we should be using this parameter? Any other parameter checks/settings we need to do while using this parameter? I have read the tf/keras documentation for the same. I would like to little more explanation to understand it better. Thank you.
Steps per epoch means how many batch gradient descent steps you want to have for one run of the datasets, it also depends on batch size. The epoch will finish when the number of steps have been completed even without all data being accessed in the training process.
It can be used when you want to perform only a number of steps, can be helpful to speed up the training process overall.
What would be the reasoning behind not using all the data in an epoch? As in this example, we are augmenting the data in order to have more data. Can you please elaborate?
I am not clear on what actually happens when the step size is mentioned. Thank you.
Your explanation is very helpful. I understood the relationship between epoch, step, and bathc_size better. However, I need further clarifications about the following:
So we can mention either ‘batch_size’ parameter or ‘steps_per_epoch’ while training the model.
The values for the above parameters should relate to/same as the ‘batch_size’ mentioned in the train generator?
Can you please elaborate on how this plays out while using the data generators for augmentation? As in this lab,
case 1: the batch_size for the train generator is 128. And these 128 images are used per step in an epoch. Is this understanding correct? Or
case 2: all the transformed images are generated and stored in memory, then it’s processed according to the batch_size/steps mentioned in model.fit(). So in this case, the training batch size can be different than the batch size mentioned in the generator. Also, wouldn’t this require enough memory to store the dataset + the generated transformed images.
Thank you!
I will try my best of abilities to address your questions.
Difference in batch_size parameter and steps per epoch in relation to data generator. steps_per_epoch indicates how many times you will fetch a new batch(batch_size) from generator during single epoch training (in this assignment epoch is 15).
The original dataset which is divided into cat and dog images.
So if you notice from the above images there is total of 12500+12500=25000 (based on the original data) out of which 2 images are removed/ignored because of zero length which is then divided or split into training and validation dataset [22500 (training) + 2500 (validation)]
As you see while creating the data generator batch size of 100 was used with a target size of 150 x 150 = 22500 (reason of using this target size is below in the image) and 5000 for validate data generator.
you notice we do not have 22500 images but 22498, the reason you know as two images were ignored or removed before creating the data generator.
So now while in model fit epoch set is 15 and step per epoch is 225 (22498(total images generated by training generator x 100(batch size) = 22500 (2 less images as mentioned above) total images
the last batch would have less than batch_size items and would be discarded.
However, in this case, it’s not a big deal to lose 1 image per training epoch. The same is for validation step. To sum up: your models are trained [almost ] correctly, because the quantity of lost elements is minor.
steps_per_epoch indicates how many times you will fetch a new batch from generator during single epoch. Hence steps_per_epoch is not same as batch_size.
Your last question answer
again if you notice the model training (model.fit) is using a data generator with a predefined batch size of 100 (created in data generator). So here the memory is already fed in the data generator which is then being used into the model fit to train the model. ( so there is no requirement of extra memory for the stored data separately and extra memory for generated data separately), although the training time might differ based on the batch size and amounts of images generated and that is why learners get the variation in training accuracy if not used the precise parameter correlated with their created model algorithm or the data generator.
Note in the above image, instead of using batch_size we use data generator which has set of images where the batch size is set to 100 already(sharing an ungraded code cell, not used for grading the assignment) which is generating required batch size of 100 that contains 22490 images (roughly 22500) for training the model, which is then passed in epoch set of 15 and each epoch (steps per epoch) has 225 images (that equal to 22500 roughly)
so in this case steps per epoch is 225 = 22500(22498)/100 this is getting trained with epoch set of 15 for this model
steps_per_epoch = int(number_of_train_samples=here generated by the data generator) / (batch_size)
Hope this clears your doubt about difference between steps_per_epoch and batch_size and also about data generator. You need to read the complete comment from the previous comment where I shared the link.
Thank you for the detailed explanation with examples. My question was based on the equation of the steps_per_epoch (shared again below). Since one can be computed from the other, in general, we can either mention steps_per_epoch or batch_size parameter for model.fit(). Mentioning both is not necessary. Right? Can you please confirm? I understand in this particular example we use data generator for which the batch_size already been set to 100.
WE CANNOT MENTION STEPS_PER_EPOCH OR BATCH_SIZE PARAMETER FOR MODEL.FIT(). You can use batch_size for data generator but for model.fit you use steps per epoch
Deepthi I just explained you batch_size is not steps_per_epoch with the detail explanation and example.
An epoch is composed of many iterations (or batches).
Iterations: the number of batches needed to complete one Epoch.
Batch Size: The number of training samples used in one iteration.
Epoch: one full cycle through the training dataset. …
Number of Steps per Epoch = (Total Number of Training Samples) / (Batch Size)
So no you cannot use batch_size instead of steps per epoch parameter for model.fit().
Epoch is the complete passing through of all the datasets exactly at once.
The batch_size is the dataset that has been divided into smaller parts to be fed into the algorithm.
Determining the optimal values for epoch, batch size, and iterations can be a trial-and-error process. The usual approach is to start with a small number of epochs and a small batch size. Further gradually increase the number of epochs and batch size until you find the best balance between training time and performance.
I see. My bad. I do understand they are not the same. Maybe I am failing at articulating my question correctly. Thank you for taking the time to clarify again. I appreciate it. I will explore more with the exercises. Thank you Deepti.
I understood your question, you are stating if one can use steps per epoch or batch_size as parameter in model.fit() for which I answered you no as both mean different thing. batch size is samples of data divided into batches where as steps per epoch is determined by computing the set of defined data (say in your case data generator) divided by batch_size. So batch_size for data_generator project in the model.fit() cannot be used in place of step_per_epoch.
Hope you understood now.
if you still confused, try yourself and put the batch_size in the model.fit() for the same assignment you asking and see the results, what do you get? Let me know.
In the example you shared above, steps_per_epoch is not mentioned but it is being computed based the the batch_size=100 set for the data generator. As you mentioned step_per_epoch here is 225 and all the data is used for training in an epoch.
We can set the step_per_epoch parameter to a value < 225, if we want to speed up the training.
With this, I re-read your responses here in this thread and the one that you shared. I see that your explanations are trying to elaborate on this.
Further, I experimented with various values for (generator batch_size, steps_per_epoch) - ones that satisfy the equation and also the ones that don’t. I have also tried setting the batch_size parameter for model.fit(). I have found answers to all my questions and understand @gent.spah response to my initial questions better. @Deepti_Prasad Thank you for your time and guidance. Appreciate it.
@Deepti_Prasad I get your frustration/shock at my complete lack of understanding of your explanations. Trying it out hands-on and re-reading your explanation after has helped answer my questions. As I mentioned previously, I appreciate your time and guidance. Yet, I must mention here, that the details also made it more confusing, your wording, and emoticons were discouraging and uncomfortable. Thank you!