C4W2: About what "Residual block is easy to learn identity function" means

allegro6335 · October 7, 2023, 11:02am

Posting this to make sure if I’m understanding right.

In a very deep neural network, as we keep training it, the weight value may be very small(maybe due to L2 regularization, or weight decay, etc.).
Assuming that bias term is 0, in a plain network,
a[l+2] = g( w[l+2]*a[l+1] )

Due to 1), w[l+2] may be very small so that we can almost ignore the term “w[l+2]*a[l+1]”.
So a[l+2] = g(0) = 0(maybe not exactly 0, but really really small value), which means activation vanishing, and this may hurt the network’s performance.

However, if we use residual block, what happens is
a[l+2] = g( w[l+2]*a[l+1] + a[l] )
ending up with a[l+2] = a[l], an identity function.
So <even if something goes wrong and the weight is almost 0, residual block will turn to an identity function> and keeps the activation alive.

That phrase in <> is what “residual block is easy to learn identity function” refers to, from what I understood. And that is why residual blocks help improving the performance.

Am I understanding right?

rmwkwok · October 7, 2023, 1:32pm

Hello @allegro6335,

It seems to me you are trying to understand Resnet from the angle of how it was introduced to solve a problem. I strongly recommend you to read even just the pretty short Introduction section of the Resnet Paper, and I am sure you can find such angle from the authors of Renset. Let me know your throughts or, if any, your most recent understanding, and we can discuss them. After you read the introduction section and if you still want to talk about your understanding in the first post, also let me know.

Cheers,
Raymond

Topic		Replies	Views
Resnet identity mapping Convolutional Neural Networks coursera-platform	1	643	May 4, 2022
Question seems to have swapped answer with another Convolutional Neural Networks coursera-platform	2	1087	March 12, 2023
Why learning identity function will give RN better performance? Convolutional Neural Networks coursera-platform	2	540	November 4, 2022
DLS4W2 Intuition on ResNet Convolutional Neural Networks coursera-platform	3	560	March 23, 2022
C4W2, Why is it easier for residual block to learn identity function? Convolutional Neural Networks coursera-platform	4	567	July 29, 2022

C4W2: About what "Residual block is easy to learn identity function" means

Related topics