Explaination for Padding= 'same' for convtranspose in the UNET architecture

Stuck in understanding why we use padding = ’ same ’ in the ConvT operation in the expansion blocks of the u-net architecture.

According to the documentation padding =‘same’ should keep the height and width same even after the convT operation eg if (32,64) becomes (32,64)

But as we can see the dimensions still double (32,64) - > (64,128)

Any clarity and clarification on this would be greatly apprecieted

Best,

Ah, yes, I remember running into this same confusion on plain vanilla convolutions a while back. It turns out that the way TF interprets “same” padding is that it pads in a way that would give you the same dimensions if you also have stride = 1. But if the stride is > 1, then the dimensions still shrink on normal convolutions and still expand on transpose convolutions. Note that we use stride = 2 on the transpose convolutions here, which is why you still get the 2x expansion.

2 Likes

Thank you very much for the reply

So basically if I’m not wrong padding =‘same’
has no effect at all if stride >1

But if so why is it mentioned that we need to specify padding == ‘same’ when we are using the convolution ( have also noticed that it gives a dimension mismatch error when padding == valid)

Please clarify

Regards,

No, it does have an effect, even when stride > 1. It’s just that the output size is not invariant when stride > 1. Try the operation with everything else the same but use the default padding = “valid” and watch what happens. The results are different, right? Both with normal convolutions and transpose convolutions.

To understand the effect, you need to run the full experiment with a total of 4 combinations:

  1. padding = “same”, stride = 1
  2. padding = “valid”, stride = 1
  3. padding = “same”, stride = 2
  4. padding = “valid”, stride = 2

And do this with both normal and transpose convolutions and watch what happens. All 4 cases above will produce a unique result. Try it and see! :nerd_face:

1 Like

Yes took me some time, but finally figured it out. Sad that there is no intuitive documentation available for the same

For the people wondering here it is,

The padding == same, basically pads the output to the multiple of the stride and input height/width

IE:
lets take
input size = 8
filter size = 3

for stride == 1

padding == valid
we get 13 as output ( which works out according to formula )

Formula is output_height = filter_height + (input_height - 1) * stride btw
( The height can also be substituted by the width )

but what is stride * input ? 8 * 1 = 8

so when you apply padding == same
the output will be 8

same thing for other strides

s==2 padding == valid
Output : 20

s==2 padding == same
Output : 16

s==3 padding == valid
Output : 27

s==3 padding == same
Output : 24

basically the idea is you can use use padding == same to make sure the output is a perfect multiple of the input

output = input * stride

Hope this helps some other learner

Once again Thank you Very Much Paul for your super quick and helpful response.
Really appreciate it

Regards,
Shas

Thanks for working this out in the transpose convolution case. Just for completeness, let’s do the same exercise for normal forward convolutions:

Let’s use your same example with input size = 8 and filter size = 3.

Here’s a play function for this purpose:

def padding_test_model(input_shape, stride, padding):
    input_img = tf.keras.Input(shape=input_shape)
    Z1 = tfl.Conv2D(filters=1, kernel_size=3, strides=stride, padding=padding)(input_img)
    model = tf.keras.Model(inputs=input_img, outputs=Z1)
    return model

Now invoke that with the 4 different combinations:

testmodel = padding_test_model((8,8,1), stride=1, padding="valid")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=1, padding="same")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=2, padding="valid")
testmodel.summary()
testmodel = padding_test_model((8,8,1), stride=2, padding="same")
testmodel.summary()

That gives the following output:

Model: "functional_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 8, 8, 1)]         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 6, 6, 1)           10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         [(None, 8, 8, 1)]         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 1)           10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_7 (InputLayer)         [(None, 8, 8, 1)]         0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 3, 3, 1)           10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Model: "functional_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_8 (InputLayer)         [(None, 8, 8, 1)]         0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 4, 4, 1)           10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0

So here is a more readable version of the results:

  1. padding “valid”, stride 1 gives (6, 6) output
  2. padding “same”, stride 1 gives (8, 8) output
  3. padding “valid”, stride 2 gives (3, 3) output
  4. padding “same”, stride 2 gives (4, 4) output

Of course we have the formula:

n_{out} = \displaystyle \lfloor \frac {n_{prev} + 2p - f}{s}\rfloor + 1

So we can use that to solve for what the p value is in the stride = 2 “same” padding case:

4 = \displaystyle \lfloor \frac {8 + 2p - 3}{2}\rfloor + 1
4 = \displaystyle p + \lfloor \frac {5}{2}\rfloor + 1
4 = \displaystyle p + 2 + 1

So p = 1 in that case, which is the same solution as in the stride = 1 case, but gives a different result.

1 Like

Hey,

IE:
lets take
input size = 8
filter size = 3

for stride == 1

padding == valid
we get 13 as output

Here, shouldn’t we have the output as 10. Going by the formula you mentioned: Output_height = Filter_height + (input_height - 1) * stride
= 3 + (8 - 1) * 1 = 10

There are some later threads with better information on this.

Here’s one that shows the sizes of the output of transpose convolutions with various input parameters.

Here’s one from mentor Raymond that explains the dimensions for transpose convolutions and also the potential ambiguity with output sizes.