Unresponsive Kernel Issues in Jupyter Notebooks

I am currently facing a recurring issue during the execution of programming assignments. Specifically, my kernel repeatedly becomes dead, resulting in the automatic grader assigning a score of zero. I have trying running it freshly on new window but it did not work. Given the impending deadline for my course, I find myself unable to resolve this matter. Therefore, I kindly request your assistance in addressing this issue.

1 Like

Do you have this problem on multiple different assignments or just one on particular one? If the latter, then it may be that there is something in your code that is causing the memory image of the notebook to become too large. E.g. added print statements for debugging inside one of the loops. One thing to try would be:

  1. Kernel → Restart and Clear Output
  2. Save
  3. Submit

The thing to realize is that the grader does not need to see your output: it only needs to call your functions and having a larger memory image because of the output can cause problems especially if the servers are heavily loaded at the time.

One other general thing to realize is that you needn’t worry about the deadlines: they are meaningless in the sense that there is no penalty for missing them. They just give you a “popup” asking you if you want to reset the deadlines. Just say “Yes” and you’re good to go …

Dear Sir,

I would like to thank for your valuable feedback. I have encountered the issue specific to one lab assignment. I also attempted to implement the solution you provided, but unfortunately, it did not fix the problem. It appears that the kernel is becoming dead in between during the grading process. Furthermore, I have reviewed my code and can affirm that there are no errors present in the code cells I have examined; by inserting print statements to confirm that.

I would appreciate any further suggestions you may give to help resolve this issue.

Does this happen to auto grader? I mean when you submit your assignment, do you get any error? If so, please share a screenshot with us.

Regarding the kernel dead issue in the Notebook, please try again at different times.

I appreciate your help. Here is what I am getting from the grader.

[ValidateApp | INFO] Validating ‘/home/jovyan/work/submitted/courseraLearner/W3A1/Tensorflow_introduction.ipynb’
[ValidateApp | INFO] Executing notebook with kernel: python3
2023-10-16 07:23:11.662813: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library ‘libcudart.so.10.1’; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2023-10-16 07:23:11.662854: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-10-16 07:23:13.010038: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-10-16 07:23:13.010075: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2023-10-16 07:23:13.010098: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-10-2-121-167.ec2.internal): /proc/driver/nvidia/version does not exist
2023-10-16 07:23:13.010357: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-16 07:23:13.038139: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2999995000 Hz
2023-10-16 07:23:13.040276: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55da5285fff0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-16 07:23:13.040306: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
[ValidateApp | ERROR] Kernel died while waiting for execute reply.
[ValidateApp | ERROR] Traceback (most recent call last):
File “/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py”, line 478, in _poll_for_reply
msg = self.kc.shell_channel.get_msg(timeout=timeout)
File “/opt/conda/lib/python3.7/site-packages/jupyter_client/blocking/channels.py”, line 57, in get_msg
raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/preprocessors/execute.py", line 41, in preprocess
    output = super(Execute, self).preprocess(nb, resources)
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index, store_history)
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 578, in run_cell
    exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 483, in _poll_for_reply
    self._check_alive()
  File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 510, in _check_alive
    raise DeadKernelError("Kernel died")
nbconvert.preprocessors.execute.DeadKernelError: Kernel died

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/apps/validateapp.py", line 72, in start
    validator.validate_and_print(filename)
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 340, in validate_and_print
    results = self.validate(filename)
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 311, in validate
    nb = self._preprocess(nb)
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 290, in _preprocess
    nb, resources = pp.preprocess(nb, resources)
  File "/opt/conda/lib/python3.7/site-packages/nbgrader/preprocessors/execute.py", line 44, in preprocess
    raise UnresponsiveKernelError()
nbgrader.preprocessors.execute.UnresponsiveKernelError

[ValidateApp | ERROR] nbgrader encountered a fatal error while trying to validate ‘submitted/courseraLearner/W3A1/Tensorflow_introduction.ipynb’

Please check a direct message from me for further instructions.

Okay! Thanks for sharing your notebook. I noticed that the kernel dies because of your code for Exercise 5 - forward_propagation. You are using NumPy function like np.dot and that is mathematically correct. However, the instructions given to you is:

Note Use only the TF API.

  • tf.math.add
  • tf.linalg.matmul
  • tf.keras.activations.relu

So, try using this. Or you can use the @ operator between two matrices but don’t use NumPy.

@paulinpaloalto sir! Any thoughts on why NumPy cause kernel dead issue?

I’m not sure why using Numpy functions there would cause the kernel to die, but it is definitely a mistake and a problem. The issue is that training gets run later in the notebook and that uses the TF mechanisms for automatically computing gradients and doing back propagation. That requires that all computations in the compute graph from the parameters to the cost must be TF functions. If you put a numpy function in the graph, it breaks the computation of gradients because TF only has the gradient logic for TF functions.

I have seen errors like this in the past when someone used labels.T to perform the required transpose in compute_total_cost rather than using tf.transpose, but my memory is that it gives an explicit message about not being able to compute gradients rather than just the kernel dying. I’ll go try that again and refresh my memory.

Thank you sir @paulinpaloalto and @saifkhanengr! Your guidance helped me complete my assignment and, at the same time, learn from this experience.

This is our very first introduction to TensorFlow and there is a lot to learn. It’s been quite a while since I listened to the lectures here, so I forget whether Prof Ng discusses how gradients are handled in TF in the lectures. They do mention it briefly in the notebook and tell you only to use TF functions, but they don’t really explain why.

Just for completeness, I went back and ran the experiment I described above of using the numpy transpose in compute_total_cost. It passes all the test cases in the notebook until you run the section later where it trains the model. Then it throws a big exception trace with this as the final error message:

ValueError: No gradients provided for any variable: ['Variable:0', 'Variable:0', 'Variable:0', 'Variable:0', 'Variable:0', 'Variable:0'].

So it doesn’t specifically say anything about “avoid numpy”, but it does tell you that there is some kind of problem with the gradients. As opposed to the kernel just dying …

I just continued the experiments and reimplemented forward_propagation using np.dot and I get the same behavior that the kernel dies just as Munim reported. Note that the test cell for forward_propagation does actually use gradient tape, so apparently that can also be the result if you try to compute TF gradients on a compute graph with non-TF functions.

Interesting! So we all learned something from this thread. :nerd_face:

1 Like