What is the formula for getting the output dimension in case of transpose convolution? It’s not mentioned in the video.

Is this from Course 3?

This is from DLS Course 4 (ConvNets). Transposed convolutions are introduced in Week 3 as part of the discussion about the U-Net Semantic Segmentation topic.

Here’s a thread which discusses the formula for computing the output size of a transposed convolution.

I had the same question and found the answer in the pytorch documentation for `ConvTranspose2d`

which provides a formula [1] to determine the output width and height of the transposed convolution (scroll to section of doc labeled `Shape`

). A simplified version of that formula is copied below.

where

- n_{out} is the output width or height (same in both dimensions)
- n_{in} is the input width or height (same in both dimensions)
- s is the stride (same in both dimensions)
- p is the padding (same in both dimensions)
- f is the filter dimensions (same in both dimensions)

[1] ConvTranspose2d — PyTorch 2.0 documentation

In the week 3 lecture on transpose convolution we have the following values for the parameters:

- n_{in} = 2
- s = 2
- p = 1
- f = 3

Plugging these values into the formula above I get the same dimensions from in the video.

Note: The formula in the pytorch doc is different from what @paulinpaloalto pasted at the end of this thread Lecture slide 44 (of 47) -Using Keras to duplicate calculations

Notice that there are two different forms of padding on the PyTorch API: *input* padding and *output* padding. You have treated them as the same when you were doing algebra on the formula, but they’re not really. The TF version of that API also has the same distinction.

I was confused by the fact that I found different formulas for the output sizes in different places, but here’s a thread from Raymond which answers all this. It turns out that the output size is ambiguous if you consider transpose convolutions as the inverse of forward convolutions. As soon as the stride > 1, then there is more than one input size that will give the same output size in the forward case, right? So what is the size of the inverse? You have a choice to make and apparently not everyone agrees on how to make that choice.