Kernel dies . ... w3a2 Image_segmentation_Unet_v2

Andrew_Wylde · February 20, 2024, 4:34am

Followed all the instructions, all my functions passed, no infinite loops or such. Training model fails with “Kernel has died, it will be restarted”. It has gotten as far as Epoch 3, but usually fails in 1 or 2, at different places. I have restarted Kernel and Checkpointed and saved my work and rebooted server many times over. Still fails. Result just kind of stops like this:

(TensorSpec(shape=(96, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(96, 128, 1), dtype=tf.uint8, name=None))
Epoch 1/5
5/34 [===>…] - ETA: 8s - loss: 2.9687 - accuracy: 0.1911

Lab Id wcwzghbmwikn

Can you please help? Also seems very slow to respond.

AW

TMosh · February 20, 2024, 5:07am

There are two common reasons for the kernel dying.

The Jupyter server is too busy. “Try again later” seems helpful in this case.
If your notebook contains a lot of output data, and the kernel cannot digest it all. For this, try using “Kernel → Reset and Clear Output” followed by “File → Save and Checkpoint” then Submit without running any cells.

If neither of them help, then maybe there is an error in your code.

Andrew_Wylde · February 20, 2024, 7:05pm

Thank you for the quick response. For the benefit of anyone looking at this with a similar problem, I did NOT get a “Try again later” message or anything implying that the server was too busy, although that may have well been the case

I also did not add any output other than the output from the course testing output that was already there. If I add output for the purpose of debugging, I always comment it out right away after I have figured out the issues.

That said, I ran each cell from the top to make sure there weren’t any errors in my code or I had missed something. I did not change a thing in my code, and when I got to the training portion it trained without issue and I completed the lab. I actually trained it a second time just to make sure it wasn’t a fluke.

It seems that there was something wrong with the system (like low resources) that has been resolved now, it would be nice to get an error message of some type indicating that.

Thank you for your assistance.

TMosh · February 20, 2024, 8:00pm

Thanks for your report.

paulinpaloalto · February 20, 2024, 8:32pm

Well, I guess what your experiment has showed is that

Is the error message indicating that.

Andrew_Wylde · February 20, 2024, 9:13pm

I guess so. I guess I could reword that to say a HELPFUL error message. If I see a stack overflow or attempt to access protected memory or such I would expect that it was an error in my code. But when the kernel just dies it is anyone’s guess.

So, just FYI, since I am not really a Python programmer and this is the first time I have used Jupyter, is there a way to check the status of your environment to see if there are a lack of resources for your program? I spent a lot of time trying to troubleshoot my own code, in fact I had hoped to complete another module last night. I might just take a Python class after this so I could learn in a more orderly fashion rather than on the fly.

TMosh · February 20, 2024, 9:39pm

Maybe, but not that I’m aware of. I’m not much of a Jupyter expert.

paulinpaloalto · February 20, 2024, 11:26pm

Of course you’re right in that and I assumed that was you really meant. I was just making a small joke, but it’s not totally without a point: it is the case in general that error messages of all sorts (from compilers, e.g.) sometimes take some experience to interpret. We should advocate for improvements, but in the meantime we must deal with things as they are, not as we wish them to be.

Andrew_Wylde · February 21, 2024, 4:09am

Agreed, but if we accept mediocrity (or worse) that is what we get.

I understand you were making a joke, just frustrating as time is the one thing I don’t have enough of, and its frustrating to make time for the class and have it wasted by something like this.

Topic		Replies	Views
Kernel dying consistently in DLS 4/week 3 U-Net assignment Convolutional Neural Networks coursera-platform	62	1312	November 29, 2023
Course 4 > Week 3 > U-Net Assignment: Kernel keeps dying Convolutional Neural Networks coursera-platform	2	657	March 20, 2023
Course 4 Week 3 programming assignment: Unet-training causes kernel to die Convolutional Neural Networks coursera-platform	9	824	May 4, 2023
Kernel Dying during training in Unet Assignment Convolutional Neural Networks week-module-3	2	8	June 16, 2025
C4 W2 A1: Kernel getting dead Convolutional Neural Networks coursera-platform	4	410	August 11, 2023

Kernel dies . ... w3a2 Image_segmentation_Unet_v2

Related topics