Grader timeout in C1W4 assignment

I am getting following error from the grader in the C1W4 assignment. Is that because of my code or a problem with the grader?

[ValidateApp | INFO] Validating '/home/jovyan/work/submitted/courseraLearner/W4_Assignment/C1W4_Assignment.ipynb'
[ValidateApp | INFO] Executing notebook with kernel: python3
2021-09-14 06:38:53.473039: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2021-09-14 06:38:53.473151: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2021-09-14 06:38:53.473162: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2021-09-14 06:38:54.702695: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-09-14 06:38:54.702727: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
2021-09-14 06:38:54.702755: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-10-2-6-127.ec2.internal): /proc/driver/nvidia/version does not exist
2021-09-14 06:38:54.702930: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-09-14 06:38:54.732858: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz
2021-09-14 06:38:54.734631: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558e8ee8d1d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-09-14 06:38:54.734661: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
[ValidateApp | ERROR] Timeout waiting for execute reply (30s).
[ValidateApp | ERROR] Interrupting kernel
[ValidateApp | ERROR] Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 589, in run_cell
        msg = self.kc.iopub_channel.get_msg(timeout=timeout)
      File "/opt/conda/lib/python3.7/site-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
        raise Empty
    _queue.Empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/apps/validateapp.py", line 72, in start
        validator.validate_and_print(filename)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 340, in validate_and_print
        results = self.validate(filename)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 311, in validate
        nb = self._preprocess(nb)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/validator.py", line 290, in _preprocess
        nb, resources = pp.preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbgrader/preprocessors/execute.py", line 41, in preprocess
        output = super(Execute, self).preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
        nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
        nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
        reply, outputs = self.run_cell(cell, cell_index, store_history)
      File "/opt/conda/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 597, in run_cell
        raise TimeoutError("Timeout waiting for IOPub output")
    TimeoutError: Timeout waiting for IOPub output
    
[ValidateApp | ERROR] nbgrader encountered a fatal error while trying to validate 'submitted/courseraLearner/W4_Assignment/C1W4_Assignment.ipynb'

Normally there should be a problem with your code somewhere!

Hi Gent, thanks for the quick reply. My code passes all the in-notebook checks and in principle I can run the training (although it seems prohibitively slow (~20s per step). I have gone over the code again and I did not find any obvious mistakes. Is there any way to get more information what exactly failed to debug the problem?

If you can provude info of where you suspect the problem might be and how you implemented that part so I can have a look at it maybe I can be of help.

Thanks again for offering help! I completely reset the whole notebook and coded it from scratch. Still the same error, so I guess it must be a logic error then? I am running out of ideas right now. I put my implementations below, maybe you can spot the issue?

Hi again,

Its not a good idea to paste solutions here because its detrimental to other people. I saw the code it looks fine as far as i can see. Are you by any chance trying the optional part in coursera, that is not supposed to be done in coursera.

Hi! No, I just edited the two code blocks that needed editing and then submitted.

Remove or comment the optional part and try submiting again.

2 Likes

Oh that did the trick. Thanks again for your help.

1 Like