U-Net Convolutions After Transpose Convolutions

Aditya_Ranganath · January 22, 2024, 3:45am

On the left half of the image, we see that when we do a normal convolution, the number of channels increases. However on the right half of the image, each normal convolution keeps the spatial dimensions as well as number of channels the same. So on the right half are we doing same padding convolutions with number of filters used equal to number of channels in the input representation?

paulinpaloalto · January 22, 2024, 4:00am

No, on the right side of the U-net architecture, we are in “expansion” mode where we need to get back to the initial image size, but with the labels incorporated. So on the right side, we are using transpose convolutions, not normal convolutions. Transpose convolutions are essentially the “inverse” of normal convolutions and they expand the geometric size of the output.

On the left side (“downsampling”), we are doing normal convolutions, but also passing the output straight across to the “upsampling” phase through the “skip” connections to make it easier to reassemble the original geometry but now with the per pixel labels.

This was covered in the lectures and we’ll get to see the full details when we do the U-net assignment.

Aditya_Ranganath · January 22, 2024, 4:11am

@paulinpaloalto - but we are using both normal convolutions as well as transpose convolutions on the right side. The green arrows represent transpose convolutions and the black arrows represent regular convolutions. We do a T-CONV followed by a couple regular CONV.

paulinpaloalto · January 22, 2024, 4:24am

Ok, it was not at all clear what your question actually was in the initial post. Did you actually look at what happens in the upsample block in the code? You can see that there are the following steps:

The transpose convolution doubles the geometric size and reduces the number of channels to the desired output number.
You concatenate the “skip” layer output so you get a lot more channels.
Then you do 2 normal convolutions with stride = 1 and same padding that have the same number of output filters that you had as the output of step 1.

That’s just one step in the “upsampling” path of course.

If the question is why are the conv2d layers necessary on the upsampling path, I don’t really know. At a simplististic level, you need to reduce the channels after concatenating the “skip” output. Take a look at the example test case they have for the upsampling_block function. They need to reduce from 160 channels to 32. So maybe the theory is that the two conv2d layers do that process of capturing all the info in the extra channels and integrating it into the 32 output channels.

The rule here is “it either works or it doesn’t”, right? The researchers who published the paper must have done some experimentation and figured out that this combination of operations works well.

Aditya_Ranganath · January 22, 2024, 4:37am

This answers my question. My question was how many filters we used in these.

Yes that makes sense.

Thanks so much @paulinpaloalto for answering this. I get it now.

Topic		Replies	Views
U-Net \| Why in U-Net architecture we used "Transpose Convolution" instead of "1*1 convolution" to decrease the number of channel? Convolutional Neural Networks coursera-platform	5	820	December 10, 2021
Week 3, Coding Assignment 2 (U-Net) - Transposed Convolution Convolutional Neural Networks coursera-platform	2	516	September 28, 2022
Course 4, week 3, programming assignment 2: transpose convolution implementation Convolutional Neural Networks coursera-platform	4	549	August 25, 2022
What is 'same' padding for transpose convolution? Convolutional Neural Networks coursera-platform	7	1013	October 10, 2023
C4W3: U-Net Algorithm Convolutional Neural Networks coursera-platform	2	529	February 7, 2022

U-Net Convolutions After Transpose Convolutions

Related topics