Notice that there are two different forms of padding on the PyTorch API: *input* padding and *output* padding. You have treated them as the same when you were doing algebra on the formula, but they’re not really. The TF version of that API also has the same distinction.

I was confused by the fact that I found different formulas for the output sizes in different places, but here’s a thread from Raymond which answers all this. It turns out that the output size is ambiguous if you consider transpose convolutions as the inverse of forward convolutions. As soon as the stride > 1, then there is more than one input size that will give the same output size in the forward case, right? So what is the size of the inverse? You have a choice to make and apparently not everyone agrees on how to make that choice.