U-Net Intuition

Anbu · October 15, 2021, 2:14pm

Hi Sir,

In the lecture video U-Net Intuition, we are having couple of questions. Can you please help to clarify ?

What is high contextual information of image ? How it differs from low level feature of an image ?
Here is my understanding about the need of skip connection. Please correct me if im wrong. the final layer got high context information from previous layer which means Network learn cat regions of the image very well but those regions having lower resolution , detected cat image region not very much clarity so by using skip connection we are going to achieve higher resolution of image . Am i right sir ?

paulinpaloalto · October 15, 2021, 2:36pm

The whole point of U-net is that the goal is to reconstruct exactly the shapes of the original image, but “painted” with their semantic labels instead of whatever colors they happened to be in the original image. The “downsampling” path just consists of normal “Conv” layers, so you lose all the high resolution spatial information and convert it into “feature recognition” information. That’s what Conv layers do, right? So how are you going to preserve or reconstruct the structures and shapes of the original image in the final output? That is what the skip layers give you.

Anbu · October 15, 2021, 5:20pm

Sir, what does it means spatial information lost when image dimension reduced ?

paulinpaloalto · October 15, 2021, 6:16pm

We’ve talked about that specific point before, haven’t we? The point is that as you go through the layers of a Convnet, the height and width dimensions reduce, unless you always and only do “same” padding. And there usually are pooling layers which also reduce the height and width dimensions. That means that what influences the output of a given neuron deep in the network is coming from a larger geographical area of the input image. Let’s suppose we start with 64 x 64 pixel images. Now think about what happens at the very first layer with a 3 x 3 filter: you know within 3 pixels where whatever influences the output of a given neuron is within the image, right? It’s in one very specific 3 x 3 quadrant of the input image. That is precise spatial information: you know where something is in the image. Now think about what happens late in the network when the height and width is reduced to, say, 2 x 2: all you can say is that what influences the neuron at the 0, 0 position of that layer came from somewhere in the upper left quadrant of the input image. That’s an area that’s 32 x 32, right? So that’s a lot lower resolution spatial location: 3 x 3 is more precise than 32 x 32. This is not a deep or subtle point. Or think about the case in which your network is just looking for a yes/no answer about whether there is a cat in the image or not. So the output of the final layer is 1 x 1. If it’s “yes”, then all you can say is that there is a cat somewhere in that 64 x 64 image, but you have no idea where. So you have completely lost the spatial information that is contained in the input image. I think ai_curious gave you exactly that example a couple of days ago.

ai_curious · October 16, 2021, 12:29pm

Exactly

20 character min

ai_curious · October 16, 2021, 12:37pm

@anbu here’s a free tip. Next time try looping in mentor @google
and see what you can find

Topic		Replies	Views
Skip Connections in U-Net Convolutional Neural Networks	2	509	November 24, 2022
C4W3: U-Net Algorithm Convolutional Neural Networks	2	528	February 7, 2022
Quick question about semantic segmentation slide Convolutional Neural Networks	12	600	January 17, 2023
Course 4 - Week 2 - Assignment 2: Why does U-net help prevent overfitting? Convolutional Neural Networks	4	526	February 14, 2023
Visualizing What ConvNet is learning Convolutional Neural Networks	5	612	October 7, 2021

U-Net Intuition

Related topics