Question for Coursera DL CNN Week2: Why ResNets Work?

Hello everyone. I just watched the video to introduce ResNets but I don’t understand some details. For the equation, a[l+2] = g(w[l+2] * a[l+1] + b[l+2] + a[l]), why we can assume that w[l+2] = 0?

Why do you believe that’s what is being assumed?

In video about 3:15, Andrew said that if w[l+2] is equal to 0. I want to know why we make an assumption like that. He want to say even the result we compute for the last layer we still can get information from a[l], isn’t it?

Andrew is pointing out that if you use L2 regularization, it works to reduce the magnitude of the weight values.

In the extreme case, then all of the weights will be very close to zero.