'The kernel appears to have died. It will restart automatically' Error in my notebook

I don’t know of a way to specifically see what is killing the kernel. We just have to reason from behavioral evidence as best we can. As you can see from reading the earlier parts of this thread, the two things that we know of so far that can cause this are:

  1. Bad weather in the cloud.
  2. The memory image of the notebook grows too large for whatever the resource constraints are of the way that notebook is provisioned and it may also be affected by the general load from Coursera notebooks.

Over how long a time span were your half dozen experiments run? If it was just an hour or two, then it would definitely be worth coming back tomorrow at a different time of day and trying again on the theory that you may be affected by the current server loads. Do you see any pattern in terms of when the kernel dies as the training runs? E.g. does it happen after some number of iterations? I assume based on the earlier discussions that you’ve already checked to make sure you’re not generating more output than necessary during the training. Of course there are other things that can increase the memory image besides just printed output. E.g. gradients that you don’t really need, but the test cases typically check to make sure you have detaches where you can.

Hello paulinpaloalto -

Appreciate the quick response! My colleague asserted that the probability of getting a response to a 2 year old thread was vanishingly small. ;}

What I initially thought was a bit of humor on your part is the most likely explanation: “Bad weather in the cloud.”

A few specific aspects:
o the kernel died almost immediately (repeatedly)
o there was no superfluous output

The notebook is successfully running in a Google colab instance (after uploading the train-labels.tif & train-volume.tif). Google Colab has a nifty feature - upper right RAM/DISK loading shows details when hovering. As epochs run, there is a slight growth in RAM but nothing out of the ordinary:

At epoch 3: RAM-> 4.28gb/12.68gb, DISK-> 25gb/78gb.
At epoch 197, it is: RAM-> 4.31gb/12.68gb, DISK-> 25gb/78gb

So “bad weather in the cloud” appears to be the technically precise explanation. ;}

However, the kernel dies immediately when re-running the notebook in the Coursera cloud at other times (e.g. just now). The “dead kernel” message also asserts the kernel will restart automatically. Ha! It would be useful to have a public view of the Coursera cloud health & the utilization of a specific notebook as it runs.

This experience had the beneficial side-effect of nudging me to experiment moving my Coursera notebooks out of the Coursera environment. I need to do this anyway in order to use these lessons in my own development.

Thanks again for the excellent demonstration of community support!

Hi, Handy-mat!

Well, this thread is only 6 days old by my reading, so we have no scientific evidence from this instance on the response probabilities with 2 year old threads. :smile: But seriously it really depends on who is “following” the thread and whether they are still active. Newly touched threads also show up under “New” in Discourse as well, so new activity even on archaeological threads can be noticed by the alert reader.

It sounds like you are already quite a sophisticated user of Colab, so you’ve probably figured everything out already, but one Coursera specific thing is that there is a simple way to capture all the files in any of the assignments so that you can duplicate them elsewhere. Here’s a very complete post from mentor Saif about all the details of getting things working on Colab. It also gives the details of how to download the assignment files in one shot.

Regards,
Paul

Appreciate the link & insights, paulinpaloalto!

Interestingly, the dead kernel at the start of training in the GAN C3W2A assignment (U-NET) (which triggered this thread) reoccurred in PIX2PIX C3W2B assignment. I noticed in the upper right a button labeled “Not Trusted”. I “trusted” the notebook, the kernel stayed alive & well throughout the rest of training.

So perhaps the Dead Kernel message could say:
“Kernel dies encountering untrusted notebook.” ;}

Thanks for your help.

I don’t understand what the “trusted” thing means. My observational experience suggests that it’s irrelevant. It works for me in either trusted or untrusted mode. Meaning that I think you are like the pigeons in the Skinner Boxes being given random reinforcement: you’re learning something from patterns that are actually meaningless. :sweat_smile:

You’re dead on right. I re-ran with it “Trusted” today - kernel died!

Any solution for this or I should just skip it?

You’ve posted in an 'AI Discussions" thread that has been cold for a year.

Which course are you attending?

1 Like

It looks like you are running the Residual Networks assignment from DLS C4 W2. Have you read the earlier parts of this thread? E.g. this post?

As Tom says, you’re generally more likely to get timely answers if you post in the category of your course, so that the right people are more likely to notice. But my guess for this particular problem is that it is completely generic and can affect any course using Jupyter notebooks. So the post I linked should be relevant and is probably the best we can do in terms of an answer.

1 Like

I apologize for all the trouble. I really appreciate your help. I will be more careful next time.