Hi! I am doing the assignment of u-net, week 3 of the convolutional networks course.
The second exercise says:
Exercise 2 - upsampling_block
Implement upsampling_block(...).
For the function upsampling_block:
Takes the arguments expansive_input (which is the input tensor from the previous layer) and contractive_input (the input tensor from the previous skip layer)
The number of filters here is the same as in the downsampling block you completed previously
*** Your Conv2DTranspose layer will take n_filters with shape (3,3) and a stride of (2,2), with padding set to same. It’s applied to expansive_input, or the input tensor from the previous layer.**
This last point confuses me because this block is made to upsample, and according to the documentation of Tensorflow, the padding “same” will result in the same width/height. I think I am missing something, but I would like to know what.
this is the Tensorflow excerpt
"same" results in padding with zeros evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input.
The “padding” in Conv2DTranspose is different from the “padding” in normal Conv2D. In Conv2DTranspose, the padding is applied to the output, and is less well-defined than in Conv2D. It’s best to think of the two as completely separate things.
Because it’s not as well-defined, the TensorFlow implementation has chosen their own way of doing it. In this case, you’re better off reading the source to see what the padding actually does.
From the source, you can see that the output length is computed simply as input_length * stride when padding = "same", which is what we want for the assignment (ie. for the output length to be double the input length).
If you’re trying to implement this again in another framework like pytorch, it’s probably best if you also double checked the documentation and/or source code to see what the padding parameter means in that framework. Since padding isn’t very well-defined for Transposed Convolution, the padding parameter in pytorch could very well mean something completely different.
The important thing to note is that “same” padding only gives you the same sized output when stride = 1. That is true for both normal and transpose convolutions. Here’s a thread about this for normal convolutions. And here’s a great thread from mentor Raymond about how it works with transpose convolutions and why the output size can be ambiguous. Here’s another thread which shows actual examples of the sizes of the outputs with transpose convolutions with different stride values.