C4W4 Assignment: Running always defaults to CPU and crashes

Hi there!
For the Week#4 assignment for Generative Deep Learning with TensorFlow, the assignment crashes while training. The backend of the google collab always defaults to CPU. This is holding me back from completing the assignment.

Any help?

Thanks,
Vadiraj Mysore

Your kernel is crashing I suppose.

reason for kernel crashing could be a cell not about to run the codes in the assigned timeline.

can you mention which cell running causes this error

Hi Deepti,
It is the training cell. After 3 Epochs, it crashes saying, “Your session crashed after using all available RAM”. I experienced this with other weeks’ labs, too. Somehow, it got resolved miraculously!

Any help o nthis one?

I think that can be more on the side of Google Colab than in Deeplearning Side.

I opted to pay for Colab Pro, and it solve the memory problems. I can select a High RAM environment and better GPUS… but it is a personal choice.

After some years I continue paying for Colab Pro…

Sometimes you can avoid it deleting some previos and unused variables and cleaning the GPU memory.

#Flush memory
del trainer, model, tokenizer
gc.collect()
torch.cuda.empty_cache()

The names of the variables are just samples…

Hi there,
Thanks for the response. First of all, when I try running the cells, at the very beginning, it tells me that it will be connecting to CPU as I haven’t paid to use the GPU.

Since we pay for the Coursera course, aren’t we supposed to be granted access to GPUs to be able to finish the assignments, in the least?

Hi @VadiMysore

The issue with course 4 assignment when you are using the free Colab is you can run the model training for once or twice after that it does give issue.

I would advise you to try running the model after 24-hour cycle in case you do not want to take premium. Also before rerunning the codes please make sure you haven’t hard-coded any path codes and your model is complying the instructions.

Let me know if the issue still persist next day, then one of the mentor need to have a look at your codes, or what you could, use the search tool for the same week assignment and cross check your codes with previous similar threads.

Regards
DP

Hi Deepthi,
There are no hard coded paths to anything in my code. I have strictly followed the instructions given in the notebook.

On someone’s advice here, I even subscribed to paid version of collab, which I should not have. Yet, I get the very same issue even with the paid version!

Even if this paid version does not work, I will give this a 24 hrs break and try again as you suggested and post my status here again.

I only hope that this does not affect my grading and deadlines.

Also, this particular course has been having this issue with many assignments and it is frustrating. I hope Coursera and the others that they are working with resolve this issue soon.

Thanks!

1 Like

Hi @VadiMysore

If you have already tried, then send me your notebook via personal DM. I will have relook at codes but my response might be delayed a bit.

Please do not post any codes or notebook here on the public post.

Also please make sure to tag the right name, my name is Deepti_Prasad and you mentioning Deepthi would go to another member who is staff of DeepLearning.AI
Use @tag and you get list of people commenting on this thread and you just select the one you want to respond.

Regards
DP

Hi @VadiMysore

Your codes are fine. Only issue is with your epoch being less as instruction mentions
The longer you train, the better your output fake images will be. You will pick your best images to submit to the grader.

Please train your model for 100 epoch and then let me know, if you are still failing the grader. Please make sure you haven’t used much of the cpu.

Understand your frustration as this assignment does check your patience. I had to re-run the training for 3 days to pass the assignment.

Let me know if you still have issue.

Regards
DP

I find this ridiculous that Coursera has no say in resources needed to complete an assignment. We pay for these courses. There is NOTHIGN wrong with my assignment. It is just that every time I run the training of the model, I run out of time. I cannot generate quality images expected. I even subscribed to Collab Pro. Using GPUs also is not completing the training. I need to talk to someone that can solve this issue for me. Either grant access to all the resources needed to pass this assignment or change the evaluation criterion to evaluate ONLY the code and not the results.

@VadiMysore

Was kernel crashing the issue this time? or usuage of cpu?

Regards
DP

Good morning, @Deepti_Prasad
Wish I could share the screen and show you what is happening. Despite getiing additional compute resources from Google Collab Pro, the program still crashes. Despite specifying the collab to use the GPU, it still defaults to using the RAM! And, it dies after running a few epochs. Attached is the hardware config. I tried other GPU configs as well.

tf version is 2.17.0

Keras version is 3.4.1