Failed to get convolution algorithm when executing Lab 1 in Colab

hungaya · September 27, 2021, 1:25am

When I run code in C1_W3 Lab1, the code running normal neural networks (without convolution layers) is passed, but I got a bug when running code that adding convolution layers. The error message is included in below.

I just run code written in the lab, do not change anything, but cannot figure out what causes it. Can you upgrade this lab to a new version?

_________________________________________________________________
Epoch 1/5
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-14-3c9b5b992e4d> in <module>()
     18 model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
     19 model.summary()
---> 20 model.fit(training_images, training_labels, epochs=5)
     21 test_loss = model.evaluate(test_images, test_labels)

6 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential_6/conv2d_8/Conv2D (defined at <ipython-input-14-3c9b5b992e4d>:20) ]] [Op:__inference_train_function_42579]

Function call stack:
train_function

The relevant code if you don't know where it is.

import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5) # the bug is here
test_loss = model.evaluate(test_images, test_labels)

cheuklap.yeung · September 27, 2021, 3:55am

Hello,

I just ran the code on Colab, and it ran smoothly.
I want to know if you can see the version of Tensorflow you have been using?
Also, do you see the model summary before printing out the 1st epoch?

The output should look similar to the following:

2.6.0
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               204928    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 243,786
Trainable params: 243,786
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.4402 - accuracy: 0.8415
Epoch 2/5
1875/1875 [==============================] - 83s 45ms/step - loss: 0.2940 - accuracy: 0.8929
Epoch 3/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.2483 - accuracy: 0.9071
Epoch 4/5
1875/1875 [==============================] - 83s 45ms/step - loss: 0.2142 - accuracy: 0.9202
Epoch 5/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.1883 - accuracy: 0.9302
313/313 [==============================] - 4s 12ms/step - loss: 0.2519 - accuracy: 0.9098

I am guessing there’s something to do with the Tensorflow version.

hungaya · September 27, 2021, 4:20am

Sorry, I changed a little bit of code in Lab 1. You are right, this is a problem with Tensorflow version. I follow the guideline in Week 1, so I always remove a comment in order to install Tensorflow 2.5.0.

More detail, the below statement

#!pip install tensorflow==2.5.0

is replaced by

!pip install tensorflow==2.5.0

After trying to run in version 2.6.0, I got a similar summary as yours, and it is worked.

Ahmad_Bustanul_Aziz · March 29, 2022, 7:35am

Hai, I am using tenrsorflow 2.6.0 version too, but still got same error like you. Any idea?

Topic		Replies	Views
Eager Few Shot Object Detection Colab - UnknownError: Failed to get convolution algorithm Advanced Computer Vision with TensorFlow week-module-2	13	705	January 13, 2022
C1W3_Lab Error: "Layer has never been called and thus has no defined input" Introduction to TF for Artificial Intelligence ... week-module-3	13	1626	August 13, 2024
C3-W2-Eager Few Shot Object Detection error Advanced Computer Vision with TensorFlow week-module-2	11	617	February 9, 2022
Error in C1_W3_Lab_1 Visualizing the Convolutions and Pooling Introduction to TF for Artificial Intelligence ... week-module-3	2	17	August 6, 2024
Trouble Running the Ungraded Labs (always leads to errors) Convolutional Neural Networks in TensorFlow week-module-3	7	27	August 12, 2024

Failed to get convolution algorithm when executing Lab 1 in Colab

Related topics