Nbgrader error - Week 2 - Transfer Learning with MobileNet

I’m trying to submit the Programming Assignment 2 but I’m running into below error. All my tests have passed already, so I’m guessing my solution is fine. It has something to do with the grader itself crashing while grading my submission.

[ValidateApp | INFO] Validating '/home/jovyan/work/submitted/courseraLearner/W2A2/Transfer_learning_with_MobileNet_v1.ipynb'
[ValidateApp | INFO] Executing notebook with kernel: python3
2021-05-02 07:48:59.555993: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-05-02 07:48:59.556035: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-05-02 07:49:00.992638: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-05-02 07:49:00.992680: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-05-02 07:49:00.992707: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (somehost): /proc/driver/nvidia/version does not exist
2021-05-02 07:49:00.992999: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-02 07:49:00.999454: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3000000000 Hz
2021-05-02 07:49:01.000375: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f69f1258da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-02 07:49:01.000409: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
[ValidateApp | ERROR] Kernel died while waiting for execute reply.
[ValidateApp | ERROR] Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 478, in _poll_for_reply
        msg = self.kc.shell_channel.get_msg(timeout=timeout)
      File "/opt/conda/lib/python3.7/site-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
        raise Empty
    _queue.Empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/preprocessors/execute.py", line 41, in preprocess
        output = super(Execute, self).preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
        nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
        nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
        reply, outputs = self.run_cell(cell, cell_index, store_history)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 578, in run_cell
        exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 483, in _poll_for_reply
        self._check_alive()
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 510, in _check_alive
        raise DeadKernelError("Kernel died")
    nbconvert.preprocessors.execute.DeadKernelError: Kernel died
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/apps/validateapp.py", line 72, in start
        validator.validate_and_print(filename)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 340, in validate_and_print
        results = self.validate(filename)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 311, in validate
        nb = self._preprocess(nb)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 290, in _preprocess
        nb, resources = pp.preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/preprocessors/execute.py", line 44, in preprocess
        raise UnresponsiveKernelError()
    nbgrader.preprocessors.execute.UnresponsiveKernelError
    
[ValidateApp | ERROR] nbgrader encountered a fatal error while trying to validate 'submitted/courseraLearner/W2A2/Transfer_learning_with_MobileNet_v1.ipynb'

I tried submitting it muliple times, a few times after clearing the kernel and output and different combinations of the same. So far 7 submissions done roughly 1 hour apart, and they all run into this error.
Please suggest what’s going wrong and how can I fix this.

Hey @apparle let me help you with this. I’m going to DM you regarding this.

1 Like

Hey,

Please make sure in Ex 2 you have frozen your base model i.e making it non trainable, because if you don’t then it starts to train and the autograder exceeds its dedicated memory resulting in a failed submission.

1 Like

That solved the problem, thank you.