Lecture slide 44 (of 47) -Using Keras to duplicate calculations

I tried to calculate transpose convolution as described on slide 44 of the lecture notes (and more in the video) using Keras as well as numpy. I could easily write the code in numpy but my keras code is not giving me the expected answers. I am obviously doing something wrong (still learning Keras). Can anyone tell me what am I doing wrong? Here is the code:

import tensorflow as tf
import numpy as np 
conv_trans = tf.keras.layers.Conv2DTranspose(filters=1, kernel_size=(3,3), strides=(2,2), padding='same',  output_padding=1)
# This is the input matrix 
input_x = np.array([[2,1],[3,2]])
input_x_tf = tf.convert_to_tensor(input_x.reshape(1,2,2,1), dtype=tf.float32)
# Initialize the weights - easier this way 
y = conv_trans(input_x_tf) 
# manually set the weight for the layer. This should be the filter used in the lecture 
w = np.array([[1,2,1],[2,0,1],[0,2,1]])
w = w.reshape(3,3,1,1)
w_tf = tf.convert_to_tensor(w, dtype=tf.float32)
c_w = conv_trans.get_weights()
c_w[0] = w_tf 
conv_trans.set_weights(c_w)
y=conv_trans(input_x_tf)
print(y.shape)
print(tf.reshape(y, (4,4)))

Output:

(1, 4, 4, 1)
tf.Tensor(
[[ 2.  4.  3.  2.]
 [ 4.  0.  4.  0.]
 [ 3. 10.  7.  6.]
 [ 6.  0.  7.  0.]], shape=(4, 4), dtype=float32)

Which lecture notes are you referring to (which week and lecture)?

Apparently it’s from Week 3.

Sorry, but I don’t understand what your code is trying to do.

I am trying to use TensorFlow/Keras API for transpose convolution calculation. I wanted to reproduce Andrew’s calculations on slide 44 of Week 3 using numpy (from scratch) and Keras. Numpy was easy but I am struggling with Keras.

  • W is the weight of the filters as used on slide 44
  • input_x is the 2x2 matrix input.
  • y should be the 4x4 matrix from the lecture.

When I look at the API docs, I see padding, output_padding and dilation_rates that I do not quite understand . I think in general, I do not understand transpose convolution as well as I should. A good reference will be useful.

Sorry, I was not able to figure it out either.

Maybe another mentor will stop by and assist.

My guess is that the problem is that what Keras means by “same” padding is different than what you might expect. I have only checked this out in the “forward convolution” case, but I’ll bet the same rule applies in the transposed convolution case as well. What they do is compute the padding for “same” with the stride hard-coded to 1. That means you only actually get the same size as the output if the stride actually is 1. In the case of stride > 1, the output will not be the same size. If you think this is a clear violation of the Principle of Least Astonishment, I wouldn’t disagree with you. Which part of “same” don’t they understand? Why they do it that way, I have no idea, but that’s the way it works. Try it and see!

I have been meaning to get back to this and actually construct some experiments that demonstrate this, but have not had time. Maybe this will motivate me to get that done sooner rather than later. But if you have time to try that, please show us what you learn! If you can show that what I said above is wrong, that would be a big relief! :laughing:

Update January 2023: it turns out there is a good reason why “same” padding is only calculated with stride = 1. If you actually solve for the amount of padding required to get the same sized output when stride is 2 or greater, the numbers are crazy. Here’s a more recent thread that shows some examples of what would be required.

Here’s my experiment for forward convolutions. I used the dataset in the U-Net exercise in C4 Week 3:

# Cell to experiment with padding on Conv2d
print("padding = 'valid' stride = 1")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2D(filters = 3, kernel_size = 3, padding = 'valid'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

print("padding = 'valid' stride = 2")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2D(filters = 3, kernel_size = 3, strides = 2, padding = 'valid'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

print("padding = 'same' stride = 1")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2D(filters = 3, kernel_size = 3, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

print("padding = 'same' stride = 2")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2D(filters = 3, kernel_size = 3, strides = 2, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")
    
print("padding = 'same' stride = 3")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2D(filters = 3, kernel_size = 3, strides = 3, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

Running that gives this result:

padding = 'valid' stride = 1
playimage.shape (1, 96, 128, 3)
playout.shape (1, 94, 126, 3)

padding = 'valid' stride = 2
playimage.shape (1, 96, 128, 3)
playout.shape (1, 47, 63, 3)

padding = 'same' stride = 1
playimage.shape (1, 96, 128, 3)
playout.shape (1, 96, 128, 3)

padding = 'same' stride = 2
playimage.shape (1, 96, 128, 3)
playout.shape (1, 48, 64, 3)

padding = 'same' stride = 3
playimage.shape (1, 96, 128, 3)
playout.shape (1, 32, 43, 3)

For forward convolutions, the formula is:

n_{out} = \displaystyle \lfloor \frac {n_{in} + 2p - f}{s} \rfloor + 1

So you can see that what they do in the “same” padding case is solve for p assuming always that stride = 1 and then use the resulting padding value (p = 1 in the examples above) regardless of what the requested stride actually is. As a result “same” padding only gives the same output shape in the case that the actual stride really is 1. For any stride > 1, the output is smaller.

Stay tuned while I try the analogous thing with transposed convolutions.

Here is the Transpose version:

# Cell to experiment with padding on Conv2DTranspose
print("padding = 'valid' stride = 1")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2DTranspose(filters = 3, kernel_size = 3, strides = 1, padding = 'valid'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

print("padding = 'valid' stride = 2")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2DTranspose(filters = 3, kernel_size = 3, strides = 2, padding = 'valid'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")
    
print("padding = 'same' stride = 1")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2DTranspose(filters = 3, kernel_size = 3, strides = 1, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

print("padding = 'same' stride = 2")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2DTranspose(filters = 3, kernel_size = 3, strides = 2, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")
    
print("padding = 'same' stride = 3")
playmodel = tf.keras.Sequential()
playmodel.add(Conv2DTranspose(filters = 3, kernel_size = 3, strides = 3, padding = 'same'))
for playimage, _ in processed_image_ds.take(1):
    playimage = tf.expand_dims(playimage, 0)
    print(f"playimage.shape {playimage.shape}")
    playout = playmodel(playimage)
    print(f"playout.shape {playout.shape}")

And here is the output from running the above:

padding = 'valid' stride = 1
playimage.shape (1, 96, 128, 3)
playout.shape (1, 98, 130, 3)

padding = 'valid' stride = 2
playimage.shape (1, 96, 128, 3)
playout.shape (1, 193, 257, 3)

padding = 'same' stride = 1
playimage.shape (1, 96, 128, 3)
playout.shape (1, 96, 128, 3)

padding = 'same' stride = 2
playimage.shape (1, 96, 128, 3)
playout.shape (1, 192, 256, 3)

padding = 'same' stride = 3
playimage.shape (1, 96, 128, 3)
playout.shape (1, 288, 384, 3)

My current understanding is that the size formula is:

n_{out} = (n_{in} - 1) * s + f - 2p

I’m not sure I believe that based on the numbers I’m seeing above. More research required …

But the bottom line here is that it is the same story as with forward convolutions: “same” only means “same” if the stride is 1. Otherwise it is not the same.

Update: We later got an explanation for the ambuigities in the formula for computing the output size of a transpose convolution. Here’s a thread from Raymond which gives the explanation.