# How does padding work in transpose convolution?

In the video, only the first row and the first column are removed. Why are the last row and last column kept?

In the example in this link, the first and last rows and columns are removed.

I am really confused about it

1 Like

I think youâ€™re just misinterpreting the slides. All the grey pixels around the edges are ignored. It looks a bit asymmetric only because stride = 2 means you only get two horizontal operations: the third one would take you off the end.

Then how do you determine the size of the output before the operation. In other words, why is the last grey column in the sixth column rather than the fifth column?

That is the definition of how padding works. The formula for the dimensions of the output is given on this thread.

Using the formula n_{out}=(n_{in}-1)\times s+f-2p, n_{in}=2, s=2, p=1, s=2, then n_{out}=(2-1)\times 2+3-2\times 1=3, so the output in this slide is wrong?

You may be right. More research needed â€¦

I have not really looked carefully at the meaning of padding in transposed convolutions. I will try to find more information and let you know if I can find anything relevant.

Hello @ZHONG_Yiyuan, and @paulinpaloalto,

I think both Andrewâ€™s result and Paulâ€™s formula are NOT wrong. There is some ambiguity built-in to this. If we look at this formula for computing the output size of the normal convolution:

The floor operation makes the following situation possible:

That different InputSize gives the same Output Size. Now, if we are to design the transposed convolution operation, and given an input image of size (2 x 2), should the operation return a (3 x 3) or a (4 x 4) matrix?

That is the ambiguity. There are two possibilities in my above example.

In short, the n_{out} = 3 that you calculated using Paulâ€™s formula, and n_{out} = 4 that you see in Andrewâ€™s video are two of the possibilities. You will further find this make sense if you do a normal convolution with an input (3 x 3), and another normal convolution with an input (4 x 4) using a (3 x 3) kernel, stride = 2, padding = 1, then you will find both result in a (2 x 2). Again, that is the ambiguity.

Now, the problem comes to: how to address this in an implementation of the transposed convolution? Pytorch uses the same equation as Paulâ€™s as far as only stride, pad, kernel size, image size are concerned. Tensorflow has different equations depending on how you parameterize it.

Therefore, Andrewâ€™s implementation results in what you see in the video. And you can implement your own that results in the way that you described in your first post. However, what is unchanged is that, we use the steps described by Andrew to do all those element-wise multiplication and then place the results in the right place in the matrix of which the output shape is pre-computed (by formula like Paulâ€™s, Pytorchâ€™s, Tensorflowâ€™s, Andrewâ€™s, or yours).

Raymond

2 Likes

I found the way that Tensorflow infers the OutputSize when output_padding=None is difficult to understand.

You are welcome @ZHONG_Yiyuan!

I am taking a walk now thinking what I should add to the article before I forget about it. Haha, the article helps me remember things.

As for the output padding, I didnâ€™t look into that either. Although I am going to update that part of my article, I am not quite going to get into the details of that, perhaps unless I see any documentation about that by Tensorflow.

I am not quite interested in that because without more context, that output_padding parameter is nothing more than for adjusting the output shape. It doesnâ€™t change the arithmetics which is the core. That parameter has impact, but it doesnâ€™t look extremely important to me at this stage.

Perhaps we need to ask ourselves, who will use that parameter and for what purpose. I donâ€™t have an answer to that, but when you have it, please share with me.

Cheers,
Raymond

Hello @ZHONG_Yiyuan, I have updated my article on output_padding for its purpose to address on that ambiguity, however, it is not going to be about output_padding=None.

Raymond

the way i see is in the normal convolution formula of output is given as-
[(n+2p-f)/s +1] where â€ť[â€ś â€ť]â€ťis the greatest interger function.

hence putting the values of transpose convolution in above function we have

[(n+2 *1-3)/2 +1] =2

where p=1, s=2, we are treating the input 2x2 matrix of transpose convolution as the output of normal convolution in above formula

hence
[(n-1)/2]=1

therefor
1<= (n-1)/2 <2
2<= n-1 <4
3<= n <5

hence
n = 3 or 4
@rmwkwok @paulinpaloalto @ZHONG_Yiyuan

1 Like

Hello, @tarunsaxena1000,

Yes, either an input size of 3 or 4, as you calculated, may produce 2 as the output size!

Raymond

1 Like