Failed to get convolution algorithm when executing Lab 1 in Colab

When I run code in C1_W3 Lab1, the code running normal neural networks (without convolution layers) is passed, but I got a bug when running code that adding convolution layers. The error message is included in below.

I just run code written in the lab, do not change anything, but cannot figure out what causes it. Can you upgrade this lab to a new version?

_________________________________________________________________
Epoch 1/5
---------------------------------------------------------------------------
UnknownError                              Traceback (most recent call last)
<ipython-input-14-3c9b5b992e4d> in <module>()
     18 model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
     19 model.summary()
---> 20 model.fit(training_images, training_labels, epochs=5)
     21 test_loss = model.evaluate(test_images, test_labels)

6 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential_6/conv2d_8/Conv2D (defined at <ipython-input-14-3c9b5b992e4d>:20) ]] [Op:__inference_train_function_42579]

Function call stack:
train_function
The relevant code if you don't know where it is.
import tensorflow as tf
print(tf.__version__)
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5) # the bug is here
test_loss = model.evaluate(test_images, test_labels)

Hello,

I just ran the code on Colab, and it ran smoothly.
I want to know if you can see the version of Tensorflow you have been using?
Also, do you see the model summary before printing out the 1st epoch?

The output should look similar to the following:

2.6.0
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               204928    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 243,786
Trainable params: 243,786
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.4402 - accuracy: 0.8415
Epoch 2/5
1875/1875 [==============================] - 83s 45ms/step - loss: 0.2940 - accuracy: 0.8929
Epoch 3/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.2483 - accuracy: 0.9071
Epoch 4/5
1875/1875 [==============================] - 83s 45ms/step - loss: 0.2142 - accuracy: 0.9202
Epoch 5/5
1875/1875 [==============================] - 84s 45ms/step - loss: 0.1883 - accuracy: 0.9302
313/313 [==============================] - 4s 12ms/step - loss: 0.2519 - accuracy: 0.9098

I am guessing there’s something to do with the Tensorflow version.

1 Like

Sorry, I changed a little bit of code in Lab 1. You are right, this is a problem with Tensorflow version. I follow the guideline in Week 1, so I always remove a comment in order to install Tensorflow 2.5.0.

More detail, the below statement

#!pip install tensorflow==2.5.0

is replaced by

!pip install tensorflow==2.5.0

After trying to run in version 2.6.0, I got a similar summary as yours, and it is worked.

Hai, I am using tenrsorflow 2.6.0 version too, but still got same error like you. Any idea?