Quick question about semantic segmentation slide

sunson29 · January 16, 2023, 12:15am

Hi mentor and friend,

In the picture (or video U-Net Architecture Intuition) . prof firstly said, For layer 5 (red label), layer 4 provides "the high level, spatial, high level contextual information. , " then prof also said that, for layer 4, ”But what is missing is a very detailed, fine grained spatial information. Because this set of activations here has lower spatial resolution to heighten with is just lower., “ At the end of the video, still about this layer 4, prof says, ”lower resolution, but high level, spatial, high level contextual information, as well as the low level. “

Q1, after all, the spatial information from layer 4 only is high or low, I am confused here.

Q2, Then, prof talks about the link. I feel like, because spatial information is not good enough (low) from layer 4, that’s why we need a link directly from layer 1 which has lower feather, but higher spatial information. Is this correct?

If Q2 is correct, this means, Q1 answer is low…

thank you!

rmwkwok · January 16, 2023, 2:50am

Hello @sunson29,

We can describe Layer 1 as having the lower-level, textural-like, or higher spatial resolution information as it is closer to the original image.

In contrast, we can describe layer 3 as having the higher-level, contextual-like, or lower resolution information as it is deeper into the neural network.

For your Q1 & 2, layer 4 has relatively lower spatial resolution than layer 1, and that’s why we want the skip connection to bring that to layer 5.

Cheers,
Raymond

sunson29 · January 16, 2023, 3:08pm

because layer 4 doesn’t have enough spatial information (even pic has been enlarged after some layers), that’s why we give a link directly from layer 1 to 4, since 1 has that strong spatial information. just double check, right ?

rmwkwok · January 16, 2023, 3:34pm

Hello @sunson29

The link is from layer 1 to layer 5.

Yes, layer 4 has less spatial information than layer 1, so layer 1 will be a good addition. By enough, we want to specify which is compared to which

Cheers,
Raymond

sunson29 · January 16, 2023, 3:42pm

oh, my bad. link to 5 not 4… thanks buddy !!!

rmwkwok · January 16, 2023, 3:44pm

You are very welcome @sunson29!

sunson29 · January 16, 2023, 3:47pm

hey, @rmwkwok could you please take a look at my question here as well, lol ?

sunson29 · January 16, 2023, 3:49pm

so, in that question. I want to ask is that, since we need high spatial information. why the link is not started from the very 1st layer but the last layer of downsampling?

rmwkwok · January 16, 2023, 3:59pm

If i just look at the slide, the first thing to my mind is about the shape: we want to connect two things that have the same height and width because this is a condition for a valid stacking of two 3D matrices. It seems to me that your choices (blue) might not satisfy that condition.

I understand your choice should have even more spatial information but the slide’s choice should at least has more (again we want to compare) spatial information than the previous layer of the skip-connection’s destination.

So, we want to choose a layer that has more spatial information, your choice and the slide’s choice are fine, but maybe only the slide’s choice has the same height and width.

You might want to verify about the height and width, and I can’t do it now because I am actually ready to sleep It’s midnight in my timezone.

Cheers,
Raymond

rmwkwok · January 16, 2023, 4:07pm

For example, you can stack a 5×5×10 (channel last) and a 5x5x3 together. However, we can’t stack a 5x5x10 on a 4x4x3.

Raymond

paulinpaloalto · January 16, 2023, 5:25pm

Exactly. It only works to pass the geometric information between the steps where the shapes match. The diagrams in the assignment notebook are really clear and (although it’s been a couple of years since I watched the lectures on this) I’m sure Prof Ng explains this in the lectures as well. The whole point of the U-net architecture is that you have two paths:

The “downsampling” path, which looks like a pretty normal Convolution Network. It’s analyzing all the geometric information and distilling it into the distinct objects and the identities of those objects. It does that in a number of layers with decreasing geometric information and increasing semantic information as you proceed through the layers. You can see the height and width of the layer outputs decrease as the number of channels increase, as is typical in that kind of ConvNet.

Then you take the final results of the downsampling path and feed it back into the “upsampling” path. The job there is to reconstruct the geometry of the original image, but with the color values of the pixels replaced by the semantic identification of the object that contains each pixel. So the upsampling path goes through several steps of “reinflating” the image using transposed convolutions and those steps mirror the process from the downsampling path in terms of the shapes as they grow from the minimal semantic base to the full image again. So what the “shortcut” or “skip” layers do is feed the corresponding level of geometric information from the matching steps of the downsampling path to the corresponding level of the upsampling path. That’s how the step by step process can be successful at reconstructing the labelled images.

I think what you should do now is actually go back and watch the lectures on U-net again. I’m sure Prof Ng covered everything that Raymond and I just explained.

sunson29 · January 17, 2023, 2:16pm

thank you! got it ~

sunson29 · January 17, 2023, 2:27pm

@paulinpaloalto thanks paul. nice to see you here again ~

Topic		Replies	Views
U-Net Intuition Convolutional Neural Networks	5	528	October 16, 2021
Contextual features vs textural features vs spatial features Convolutional Neural Networks	1	497	May 19, 2022
C4W3: U-Net Algorithm Convolutional Neural Networks	2	528	February 7, 2022
Convolutionally Implementation of Sliding window Convolutional Neural Networks	1	510	October 6, 2021
Week 4 - Assignment 1 - Face recognition Convolutional Neural Networks	1	513	June 21, 2022

Quick question about semantic segmentation slide

Related topics