Thanks for working this out in the transpose convolution case. Just for completeness, let’s do the same exercise for normal forward convolutions:
Let’s use your same example with input size = 8 and filter size = 3.
Here’s a play function for this purpose:
def padding_test_model(input_shape, stride, padding):
input_img = tf.keras.Input(shape=input_shape)
Z1 = tfl.Conv2D(filters=1, kernel_size=3, strides=stride, padding=padding)(input_img)
model = tf.keras.Model(inputs=input_img, outputs=Z1)
return model
Now invoke that with the 4 different combinations:
testmodel = padding_test_model((8,8,1), stride=1, padding="valid")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=1, padding="same")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=2, padding="valid")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=2, padding="same")
testmodel.summary()
That gives the following output:
Model: "functional_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 8, 8, 1)] 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 6, 6, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 8, 8, 1)] 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 8, 8, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) [(None, 8, 8, 1)] 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 3, 3, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_13"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) [(None, 8, 8, 1)] 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 4, 4, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
So here is a more readable version of the results:
- padding “valid”, stride 1 gives (6, 6) output
- padding “same”, stride 1 gives (8, 8) output
- padding “valid”, stride 2 gives (3, 3) output
- padding “same”, stride 2 gives (4, 4) output
Of course we have the formula:
n_{out} = \displaystyle \lfloor \frac {n_{prev} + 2p - f}{s}\rfloor + 1
So we can use that to solve for what the p value is in the stride = 2 “same” padding case:
4 = \displaystyle \lfloor \frac {8 + 2p - 3}{2}\rfloor + 1
4 = \displaystyle p + \lfloor \frac {5}{2}\rfloor + 1
4 = \displaystyle p + 2 + 1
So p = 1 in that case, which is the same solution as in the stride = 1 case, but gives a different result.