Training time for my model

Marios_Constantinou · May 12, 2022, 8:32am

So i built a ResNet 50 model. For each epoch it needed 2 hours. For 100 epochs, it is 200 hours which will take around 8 days to complete 1 training. I thought I would create a smaller model. From 15 million parameters, i dropped to 1.5 million. Training time is now at 1 hour and 45 minutes. The drop in my parameters is significant, but training time droped only 15 minutes.

Any input about it?

alvaroramajo · May 12, 2022, 9:42am

Hi, @Marios_Constantinou!

When it comes to training deep learning models, there are a couple of thing that to be taken into account to speed up the process. First, I am going to discuss the most common bottlenecks in the overall pipeline:

Loading data from disk for each batch: if all the data is pre-loaded in RAM, it is much faster, although this is not always possible due to memory restrictions. If this is the case, make sure the loading process is optimized.
Evaluation process may take some time. It might be a good option to just evaluate after several epochs, not in every single one.
Save model on each epoch. Similar to the previous one.

Assuming everything else is optimized, the number of parameters is not the only thing that matters for model performance. You have to consider how many FLOPs (Floating Point Operations) it needs for each single forward pass and how paralellized this ops are (throughput). Check Table 1 of Gao et al. for reference.

Marios_Constantinou · May 12, 2022, 11:45am

Gotcha, I will look into it!

Topic		Replies	Views
Each epoch takes 5 minutes Convolutional Neural Networks in TensorFlow week-module-4	2	510	January 13, 2023
C2W2_Assignment execution is too long Convolutional Neural Networks in TensorFlow week-module-2	5	551	July 25, 2023
C1W4_Assignment Time Issue Custom Models, Layers and Loss Functions with TF week-module-4	1	352	November 3, 2023
Takes an Hour to complete each Epoch Convolutional Neural Networks in TensorFlow week-module-2	5	654	May 12, 2022
Week 3 - Tensorflow takes more time than scratch implementation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	545	June 22, 2021

Training time for my model

Related topics