Eager Few Shot Object Detection Colab - UnknownError: Failed to get convolution algorithm

Hello,

When I am trying to run the official notebook “Eager Few Shot Object Detection Colab” in Google Colab after copying it into my own Google Drive, I can’t.

EDIT: it also doesn’t work if I don’t copy it into my Drive.

Specifically, the line ‘prediction_dict = detection_model.predict(image, shapes)’ throws a bunch of errors, ending with ‘UnknownError: Failed to get convolution algorithm’. I also tried to run it without running the first cell, i.e., using tensorflow 2.6 instead of 2.2, but I get the same error.

It seems as if an incompatible CuDNN version were used by Colab, but that seems improbable.

Any ideas of what might be wrong?

Best,
Istvan

Getting what I believe to be similar issues down in Exercise 7 of the C3W2_Assignment Colab notebook

It seems to crop up and have a couple of different possible solutions, depending on what exactly people are running. See for example==>[Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. · Issue #24828 · tensorflow/tensorflow · GitHub]

I also have this same issue in the C3W2_Assignment as well.
From the solutions at the github link posted by @ai_curious, I tried to include the following after importing tensorflow:
physical_devices = tf.config.experimental.list_physical_devices(‘GPU’)
tf.config.experimental.set_memory_growth(physical_devices[0], True)

but the prediction still throws the same error.

Some guidance from the experts would be appreciated.

1 Like

@DeepLearning.AI-Team This might be something that merits investigation since it seems to be platform related.

EDIT: See below that a workaround is to disable hardware acceleration. maybe I missed that tip in the markup?

I manage to run it without hardware acceleration, but it is paaainfully slooow.

My current understanding is that there is a tensorflow-version-related issue in the background.

Isn’t it related to the source of the problem that the model seems to need tensorflow 2.2, while the cell containing the code for the installation of the object detection API required tensorflow >=2.5, and thus it uninstalls v2.2?

In addition, it is rather strange that apart from the two of us, nobody else seems to report this issue…

1 Like

I don’t use colab much, but is the provisioning completely deterministic? I am under the impression it depends on what is dynamically allocated to a specific instance.

Thanks, but I don’t think it’s a real solution to the actual issue.
I’m wondering if it’s an intermittent issue with colab.
In any case, it would be great if a mentor could respond.

Hi guys,

I also ran this notebook and got the same error, apparently it says it has something to do with GPU optionalities of NVIDIA cuDNN. I will report this as an issue so the QA team can have a look at it.

1 Like

@gent.spah FYI I saw a thread where this came up in one of the other TF courses. The work around here in the short term is disable hardware acceleration to run predict()

This one seems to indicate tf 2.5 vs tf 2.6 [Failed to get convolution algorithm when executing Lab 1 in Colab]

1 Like

Hi everyone! Colab updated its backend to run TF2.7 and this affected the previous design of the assignment. We have revised the initial setup in the earlier cells. Please reopen the notebook from the classroom to see the changes. You should be able to run the assignment with a GPU now. Hope it also works for you. Thanks!

1 Like

I was just getting the cuDNN version error with ‘Eager Few Shot Object Detection Colab’ with first instruction : !pip install -U --pre tensorflow==“2.2.0”. I ended up commenting that line out and then Restart Runtime and now it is working.

Hi!
When running with GPU in Colab I still get the error mentioned above by @ai_curious.

When running with CPU…when running the line
“ckpt.restore(checkpoint_path).expect_partial()”
I get the error “RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for models/research/object_detection/test_data/checkpoint/ckpt-0”.
Under “models/research/object_detection/test_data/checkpoint” I only find:
checkpoint
ckpt-0-.data-00000-of-00001
ckpt-0.index

Any help would be appreciated. Thanks!

Hi there,

I just ran the entire notebook on GPU and no issues raised, so that issue with the GPU should be resolved.

For the ckpt.restore part, this is a complex assignment with many steps, starting with exercise 4 downloading the checkpoints where you need to be careful all the way up to restoring the checkpoints. If you cannot find the right files in the mentioned folder it means that your previous steps are not done right! Also whats the expect_spatial() doing here?

Hi @gent.spah,

Thank you for your answer.
The question I have is about the Eager Few Shot Object Detection Colab, there are no exercises here.

By not running the first cell (so not installing TF version 2.2.0) the Colab finally works well (on TF version 2.7.0)! Good hint from @Mark_J_Schmidt!
At last!

1 Like