Course 4 Week 3 programming assignment: Unet-training causes kernel to die

I’m trying to complete the U-net programming assignment in the in-browser Jupyter notebook workspace and have successfully complete all the code up to the 4 - Train the Model section. However, (almost*) every time I try to run the first cell in this section to train the U-Net, the kernel dies.

From browsing the forum, I found this 2-year-old unresolved thread from someone experiencing what sounds like exactly the same issue.

*It ran successfully on one attempt, only to die shortly afterwards in one of the subsequent cells.

There are a couple of different things that can cause this behavior. It could be random “bad weather in the cloud”, but if it is happening to you predictably running a given cell it is more likely that your problem is that the memory image of your notebook is getting too large. Are you sure you didn’t add any print statements in the inner loops that are causing the output to be more voluminous than necessary?

To recover, try doing “Kernel → Restart and Clear Output” and then “Save”. After that, you should be able to submit. The grader doesn’t need to see your outputs, only your code.

No additional print statements or similar added to the code. I’ve now submitted the code to the grader and it passes. I also tried running the U-net training again today and after a few attempts I managed to get through the rest of the notebook without the kernel dying. I have a hunch the bad weather in the cloud might be it not reliably provisioning the notebook with enough memory to run the U-net training without crashing.

Yes, your provisioning theory sounds like a good explanation. I’m not sure how all that works on AWS, but it would be plausible that they have provisioning limits on the total resources available for running the notebooks and the graders at a given time, meaning that the behavior could be load dependent.

I get the same error when I train Unet Model, but unfortunately when I try to submit assignment it doesn’t pass. I tried downloading all the resources to train at my local host and everything works fine, no errors at all.

I also tried restart kernel, reboot server, get lastest version but nothing worked. Please help me!!!

What is the symptom from the grader? Just because you pass the tests in the notebook does not mean your code is general. Perhaps you fail the different test cases from the grader.

The other possibility is that you didn’t save before Submit or that your memory image is too large. To get a pure experiment, please try these steps:

  1. Kernel → Restart and Clear Output
  2. Save
  3. Submit

Then if that still fails, please show us a screenshot of the grader output.

This is screen shot show that kernel die when fit the model without showing any error

And the result I train the model on local host


And when I submit assignment, the result is not pass and grader error show that

Cell #8. Can’t compile the student’s code. Error: AssertionError(‘Error in test’)

I also check the cell #8 in the code and it work fine, all test is passed.

Thanks for the detailed output. I think the kernel dying is probably just resource constraints on the servers. But there is a definite problem with your code if the grader gets an assertion failure. If everything runs successfully in the notebook, but fails the grader, it probably means you’ve “hard-coded” something so that it matches the test case in the notebook. Note that it’s not exactly clear what “cell #8” means. Probably it’s the 8th code cell in the notebook, but it’s not clear how the grader actually works. Please check your DMs for a message from me about how to proceed here.

hey! i have the exact same problem :(. How can I proceed? Thank you!

Hi all,

The Kernel dying issue has been fixed. “Coursera has increased the memory required by the GPU assignments.”

“In order for the change to take effect for the leaners they would have to reboot their server.”

‘They can do this by clicking on the “Help” on the top right and when the panel opens, click on “Reboot”.’

If the issue still persists, let us know.

Best,
Saif.