Advanced Computer Vision with TensorFlow C3W1_Assignment taking really long to train

This post is regarding week 1 assignment on Predicting Bounding Boxes.
I finished the assignment and the training starts as expected and i can see the loss going down with each epoch. But each epoch is taking 10 minutes. And the assignment asks us to produce a model with 50 epoch run! I am not being able to generate a final model and submit.

Did anyone else having similar issue?

You still didn’t tell why you are not able to submit your model. Kindly go back and check each steps where you could make any change for better training

Sorry about the confusion. Since each epoch is taking 10 minutes, 50 epoch run will require 500 minutes that is ~8 hours. Is that expected? Or something is broken?
Happy to share my colab if that can be useful for addressing the issue.

Thanks!

is your runtime connected to GPU??

1 Like

Yes using GPU.

Can I know why you are doing assignment on copy saved ? I can see in the screenshot you have two browsers opened with the same assignment?

The reason I am asking this is, the browser shown here shows none of cell were run down, so if this is not the original assignment which you have done, then kindly check the assignment where you are having long training time issue for runtime connection.

Next training time is dependent of various factors your original data, preprocessing data, model algorithm, compile model. Check if you have used the correct optimizer as per instructions given. These general pointers to reduce your training time but getting the desired accuracy and loss.

If all of these has been correctly addressed and still you are having the same issue, then send the download notebook ipnyb via personal DM. Click on my name and then message.

Regards
DP

The assignment specially asks for creating a copy, running it and then uploading the model.

I am double checking all the exercises. Will reach out if nothing else is found. Thanks!

1 Like

Hello @Drew_Murray

in your codes, correction required

  1. define the TensorFlow Keras model using the inputs and outputs to your model
    for this you used tf.keras.models.Model which should tf.keras.Model

  2. This is not error but your batch size is too big creating the training time. Try reducing it. In that exercise, in instructions it mentions you to refer ungraded lab for hints. as told you in my other previous comment keeping model simple and batch size in relation to the dataset will help get better result as for this assignment training is again dependent of steps per epoch which is dependent of batch_size. I cannot give you direct answer for this, try grabbing the hints :slight_smile:

  3. In the same exercise 6, where you define length of validation dataset with percent of batch_size for validation steps per epoch, you need to do same for the training set too. Most of the learners are missing this important pointer for this assignment.

Do these corrections, let me know if your issue is resolved.

@Sha_Hossain, check if your point 2 and 3 I mentioned has been done accordingly.

Regards
DP

For posterity, the key issue is discussed here,

Also the assignment now has this note, NOTE (12/30/2023): A recent update in Google Colab disables the use of GPUs for the required Tensorflow version in this course. With that, you might notice a slow training time per epoch (e.g. 5 minutes per epoch). While we update the grader, please use the fallback runtime instead so you can use the GPU and have faster training

Yes this issue is currently going on with tensorflow advanced specialised course especially for course 3 and course 4. So if anyone is getting stuck, one should try fallback runtime solution as this issue is occurring because of version and update mismatch.

Regards
DP