Question for Coursera DL CNN Week2: Why ResNets Work?

ReiiNoki · March 6, 2024, 2:27am

Hello everyone. I just watched the video to introduce ResNets but I don’t understand some details. For the equation, a[l+2] = g(w[l+2] * a[l+1] + b[l+2] + a[l]), why we can assume that w[l+2] = 0?

TMosh · March 6, 2024, 3:12am

Why do you believe that’s what is being assumed?

ReiiNoki · March 6, 2024, 3:27am

In video about 3:15, Andrew said that if w[l+2] is equal to 0. I want to know why we make an assumption like that. He want to say even the result we compute for the last layer we still can get information from a[l], isn’t it?

TMosh · March 6, 2024, 5:05am

Andrew is pointing out that if you use L2 regularization, it works to reduce the magnitude of the weight values.

In the extreme case, then all of the weights will be very close to zero.

Topic		Replies	Views
Course 4, week 2 : why resnets work (relu activation function and output) Convolutional Neural Networks	1	501	February 24, 2022
C4W2: About what "Residual block is easy to learn identity function" means Convolutional Neural Networks	1	335	October 7, 2023
Sense of ResNet Convolutional Neural Networks	1	491	May 16, 2023
Why do ResNets work? Convolutional Neural Networks	3	511	February 21, 2023
Why ResNets work? weight decay causes activations to be same Convolutional Neural Networks	2	430	July 10, 2023

Question for Coursera DL CNN Week2: Why ResNets Work?

Related topics