Hi All,
I have gone through U-Net architecture and couldn’t understand how does first downsampling which reduces the height and width of an image but increases the number of channels and then upsamples it with the help of transpose which basically brings the image back? Like we are bringing the image back after applying the downsampling and upsamplings.
Can anyone explain me would help a lot.
Thanks
Kailash
Your best bet is to watch the lectures from Prof Ng again on this topic. I don’t think my typing things will be as effective as listening to what he says. Maybe there are two thoughts to hold in mind as you watch the lecture:
- In your post, you didn’t mention the “skip connections”. Those are a pretty key aspect of the U-Net architecture. You can think of their role as making it easier for the upsampling to reconstruct the structure of the image by passing intermediate information from the downsampling path directly across to the corresponding upsampling layers.
- When the upsampling reconstructs the image, it is in the newly “labeled” form showing the distilled information from recognizing the various objects that was the purpose of the downsampling path, but in the shape of the original image. You can think of the upsampling as “blending” the recognition information with the structural information from the “skip” connections.
1 Like
Thanks for your response!!! Yes, now I get it. While downsampling, it keeps the relevant information of the image and while upsampling, that information is reconstructed with the original size of the image.