Kernel dying consistently in DLS 4/week 3 U-Net assignment

As of yesterday, I’ve consistently been getting a ‘Kernel Restarting/The kernel appears to have died. It will restart automatically.’ message when attempting to do the training (section 4) of the U-Net assignment. My tests in code cells above are all passing.

I’ve read other postings here and this looks like a common infrastructure issue (versus an issue with my TF code) although I’m not seeing recent messages for this assignment.

Is there anything I can do? Restarting the kernel is not helping.

Did you add any print() statements to your code for debugging?
If yes, try commenting those out.
Please report back here.

I’m seeing the same “kernel has died” issue.

I started from a fresh copy of the notebook, implemented the functions, passed all the unit tests, and I got “the kernel has died” in Section “4 - Train the model”.

I did not add any additional debugging code.

Seems like a Coursera platform problem.

I don’t think I have any debug output. I removed print statements from the block in 2.1 but I’m still getting the kernel issue. My training never gets past the first epoch.

I tried getting the latest version (Help | Get Latest Version) but this didn’t help. I’d like to try and reset my lab to the original state, but I don’t see a checkpoint in File | Revert to Checkpoint

Any suggestions to debug?

I’m running my copy of the notebook right now, and it ran a few epochs of training in Section 4 before I got the “kernel has died” message.

I don’t know what the issue is.

I’m guessing this is a memory issue. I don’t see any way that the log files on the kernel can be exposed, nor am I seeing a CORE file or equivalent.

I think it might be possible to run the code directly (as a Python script) but without a GPU it may be very slow.

I’ll ask the course staff to check if they see the same problem.

This is also happening with me during training the model. I think, as @vorpalsnark mentioned, it is a memory issue. Perhaps lots of learners doing this assignment and not clearing the memory. Or a server error.

Best,
Saif.

The issue still seems to persist. Is there anything we can do?

The working theory is that Coursera’s GPU array is starved for resources. They get GPU service from AWS, and so many people are eating GPU cycles playing with chat-bots that GPU time is in short supply.

I’ve heard some students have had success with (edited) running their notebook during off-peak hours.

But you can submit your assignment without running any code. Grader don’t need to see the output. Try submitting your assignment, if you completed all the exercises.

Best,
Saif.

Hi Saif, thanks for the reminder, I updated my reply to refer to issues with running the notebook, rather than submitting it.

1 Like

That’s exactly what i did, i had some weird error though which i had to fix first because it didn’t want to compile my code. I was afraid that it might be because of the kernel dying. Thank you for your reply:)

Kindly share the screenshot of the error with us.

Best,
Saif.

I got rid of it by rewriting my assignment. I believe the error was on my side. It couldn’t interpret one of my cells.
Error:
Cell #8. Can’t compile the student’s code. Error: AssertionError(‘Error in test’)

I am glad to hear that.
So, had you added some weird code to the previous writing which cause the kernel die?

Best,
Saif.

Martin: I’m getting exactly the same error, even when I reverted to a fresh copy of the exercise.

Cell #8. Can’t compile the student’s code. Error: AssertionError(‘Error in test’)

Cell 8 tests (of the conv_block function) pass interactively, but fail in the AutoGrader. What change did you make?

(and as an aside: I’m still getting the kernel dying consistently, regardless of the time that I’m executing)

Coursera is working on the “dead kernel” issue.

FYI, this is still occurring. I got the “kernel has died” issue, but I was able to submit and pass the assignment. I guess I will have to wait til off-peak hours to see the results! :slight_smile:

p.s. I’m not getting the second error that some others reported.

Sorry, I have not heard any updates from the course staff on this issue.