Eager Few Shot Object Detection Colab - UnknownError: Failed to get convolution algorithm

istvan · November 15, 2021, 4:45pm

Hello,

When I am trying to run the official notebook “Eager Few Shot Object Detection Colab” in Google Colab after copying it into my own Google Drive, I can’t.

EDIT: it also doesn’t work if I don’t copy it into my Drive.

Specifically, the line ‘prediction_dict = detection_model.predict(image, shapes)’ throws a bunch of errors, ending with ‘UnknownError: Failed to get convolution algorithm’. I also tried to run it without running the first cell, i.e., using tensorflow 2.6 instead of 2.2, but I get the same error.

It seems as if an incompatible CuDNN version were used by Colab, but that seems improbable.

Any ideas of what might be wrong?

Best,
Istvan

ai_curious · November 16, 2021, 1:51pm

Getting what I believe to be similar issues down in Exercise 7 of the C3W2_Assignment Colab notebook

It seems to crop up and have a couple of different possible solutions, depending on what exactly people are running. See for example==>[Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. · Issue #24828 · tensorflow/tensorflow · GitHub]

istvan · November 16, 2021, 2:11pm

I also have this same issue in the C3W2_Assignment as well.
From the solutions at the github link posted by @ai_curious, I tried to include the following after importing tensorflow:
physical_devices = tf.config.experimental.list_physical_devices(‘GPU’)
tf.config.experimental.set_memory_growth(physical_devices[0], True)

but the prediction still throws the same error.

Some guidance from the experts would be appreciated.

ai_curious · November 16, 2021, 4:11pm

@DeepLearning.AI-Team This might be something that merits investigation since it seems to be platform related.

EDIT: See below that a workaround is to disable hardware acceleration. maybe I missed that tip in the markup?

istvan · November 16, 2021, 4:11pm

I manage to run it without hardware acceleration, but it is paaainfully slooow.

My current understanding is that there is a tensorflow-version-related issue in the background.

Isn’t it related to the source of the problem that the model seems to need tensorflow 2.2, while the cell containing the code for the installation of the object detection API required tensorflow >=2.5, and thus it uninstalls v2.2?

In addition, it is rather strange that apart from the two of us, nobody else seems to report this issue…

ai_curious · November 16, 2021, 4:15pm

I don’t use colab much, but is the provisioning completely deterministic? I am under the impression it depends on what is dynamically allocated to a specific instance.

istvan · November 16, 2021, 10:27pm

Thanks, but I don’t think it’s a real solution to the actual issue.
I’m wondering if it’s an intermittent issue with colab.
In any case, it would be great if a mentor could respond.

gent.spah · November 17, 2021, 9:51am

Hi guys,

I also ran this notebook and got the same error, apparently it says it has something to do with GPU optionalities of NVIDIA cuDNN. I will report this as an issue so the QA team can have a look at it.

ai_curious · November 17, 2021, 10:32am

@gent.spah FYI I saw a thread where this came up in one of the other TF courses. The work around here in the short term is disable hardware acceleration to run predict()

This one seems to indicate tf 2.5 vs tf 2.6 [Failed to get convolution algorithm when executing Lab 1 in Colab]

chris.favila · November 22, 2021, 4:18am

Hi everyone! Colab updated its backend to run TF2.7 and this affected the previous design of the assignment. We have revised the initial setup in the earlier cells. Please reopen the notebook from the classroom to see the changes. You should be able to run the assignment with a GPU now. Hope it also works for you. Thanks!

Mark_J_Schmidt · December 4, 2021, 7:12am

I was just getting the cuDNN version error with ‘Eager Few Shot Object Detection Colab’ with first instruction : !pip install -U --pre tensorflow==“2.2.0”. I ended up commenting that line out and then Restart Runtime and now it is working.

Ricardo_Espinagosa · January 12, 2022, 11:34pm

Hi!
When running with GPU in Colab I still get the error mentioned above by @ai_curious.

When running with CPU…when running the line
“ckpt.restore(checkpoint_path).expect_partial()”
I get the error “RuntimeError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for models/research/object_detection/test_data/checkpoint/ckpt-0”.
Under “models/research/object_detection/test_data/checkpoint” I only find:
checkpoint
ckpt-0-.data-00000-of-00001
ckpt-0.index

Any help would be appreciated. Thanks!

gent.spah · January 13, 2022, 10:02am

Hi there,

I just ran the entire notebook on GPU and no issues raised, so that issue with the GPU should be resolved.

For the ckpt.restore part, this is a complex assignment with many steps, starting with exercise 4 downloading the checkpoints where you need to be careful all the way up to restoring the checkpoints. If you cannot find the right files in the mentioned folder it means that your previous steps are not done right! Also whats the expect_spatial() doing here?

Ricardo_Espinagosa · January 13, 2022, 9:36pm

Hi @gent.spah,

Thank you for your answer.
The question I have is about the Eager Few Shot Object Detection Colab, there are no exercises here.

By not running the first cell (so not installing TF version 2.2.0) the Colab finally works well (on TF version 2.7.0)! Good hint from @Mark_J_Schmidt!
At last!

Topic		Replies	Views
C3-W2-Eager Few Shot Object Detection error Advanced Computer Vision with TensorFlow week-2	11	617	February 9, 2022
Eager Few Shot Object Detection tensorflow==2.2.0 Advanced Computer Vision with TensorFlow week-2	11	537	May 5, 2023
Failed to get convolution algorithm when executing Lab 1 in Colab Introduction to TF for Artificial Intelligence ... week-3	3	627	March 29, 2022
Tensorflow 1 is unsupported in Colab Build Basic Generative Adversarial Networks week-1	6	947	September 27, 2022
Assignment C3W2 Zombie Detector Exercise1: ModuleNotFoundError Advanced Computer Vision with TensorFlow week-2	1	559	February 20, 2022

Eager Few Shot Object Detection Colab - UnknownError: Failed to get convolution algorithm

Related topics