Advanced Computer Vision with TensorFlow C3W1_Assignment taking really long to train

Sha_Hossain · December 21, 2023, 3:56am

This post is regarding week 1 assignment on Predicting Bounding Boxes.
I finished the assignment and the training starts as expected and i can see the loss going down with each epoch. But each epoch is taking 10 minutes. And the assignment asks us to produce a model with 50 epoch run! I am not being able to generate a final model and submit.

Did anyone else having similar issue?

Deepti_Prasad · December 21, 2023, 6:18am

You still didn’t tell why you are not able to submit your model. Kindly go back and check each steps where you could make any change for better training

Sha_Hossain · December 21, 2023, 6:39am

Sorry about the confusion. Since each epoch is taking 10 minutes, 50 epoch run will require 500 minutes that is ~8 hours. Is that expected? Or something is broken?
Happy to share my colab if that can be useful for addressing the issue.

Thanks!

Deepti_Prasad · December 21, 2023, 7:15am

is your runtime connected to GPU??

Sha_Hossain · December 21, 2023, 8:25pm

Yes using GPU.

Deepti_Prasad · December 21, 2023, 8:40pm

Can I know why you are doing assignment on copy saved ? I can see in the screenshot you have two browsers opened with the same assignment?

The reason I am asking this is, the browser shown here shows none of cell were run down, so if this is not the original assignment which you have done, then kindly check the assignment where you are having long training time issue for runtime connection.

Next training time is dependent of various factors your original data, preprocessing data, model algorithm, compile model. Check if you have used the correct optimizer as per instructions given. These general pointers to reduce your training time but getting the desired accuracy and loss.

If all of these has been correctly addressed and still you are having the same issue, then send the download notebook ipnyb via personal DM. Click on my name and then message.

Regards
DP

Sha_Hossain · December 22, 2023, 4:40am

The assignment specially asks for creating a copy, running it and then uploading the model.

I am double checking all the exercises. Will reach out if nothing else is found. Thanks!

Deepti_Prasad · December 22, 2023, 8:10am

Hello @Drew_Murray

in your codes, correction required

define the TensorFlow Keras model using the inputs and outputs to your model
for this you used tf.keras.models.Model which should tf.keras.Model
This is not error but your batch size is too big creating the training time. Try reducing it. In that exercise, in instructions it mentions you to refer ungraded lab for hints. as told you in my other previous comment keeping model simple and batch size in relation to the dataset will help get better result as for this assignment training is again dependent of steps per epoch which is dependent of batch_size. I cannot give you direct answer for this, try grabbing the hints
In the same exercise 6, where you define length of validation dataset with percent of batch_size for validation steps per epoch, you need to do same for the training set too. Most of the learners are missing this important pointer for this assignment.

Do these corrections, let me know if your issue is resolved.

@Sha_Hossain, check if your point 2 and 3 I mentioned has been done accordingly.

Regards
DP

Sha_Hossain · January 8, 2024, 3:16am

For posterity, the key issue is discussed here,

Also the assignment now has this note, NOTE (12/30/2023): A recent update in Google Colab disables the use of GPUs for the required Tensorflow version in this course. With that, you might notice a slow training time per epoch (e.g. 5 minutes per epoch). While we update the grader, please use the fallback runtime instead so you can use the GPU and have faster training

Deepti_Prasad · January 8, 2024, 6:49am

Yes this issue is currently going on with tensorflow advanced specialised course especially for course 3 and course 4. So if anyone is getting stuck, one should try fallback runtime solution as this issue is occurring because of version and update mismatch.

Regards
DP

Topic		Replies	Views
C3W1_Assignment the runtime is very slow Advanced Computer Vision with TensorFlow week-module-1	7	33	November 9, 2024
TF-AF Course 3 Week 3 Assignment Error 2 Advanced Computer Vision with TensorFlow	5	395	December 31, 2023
TF1 c2 w1 Assignment is taking way to long to train Convolutional Neural Networks in TensorFlow week-module-1	5	553	December 24, 2022
Error when submitting assignment1 Advanced Computer Vision with TensorFlow week-module-1	10	480	January 24, 2024
C3W4_Assignment: Slow model training Advanced Computer Vision with TensorFlow week-module-4	6	404	June 29, 2024

Advanced Computer Vision with TensorFlow C3W1_Assignment taking really long to train

Related topics