Why ResNets work? weight decay causes activations to be same

Ashutosh_Bhujbal · July 10, 2023, 8:01am

Hi,

In the video on ResNets, Andrew mentioned that due to weight decay, the weights and bias can become zero.
Since we are using ReLU i.e a >= 0

a[l+2] = W[l+2] * a[l+1] + b[l+2] + a[l]
a[l+2] = a[l]

So, in this case, does this mean we can discard the 2 layers as both the activations are the same or is it still required?
Can someone help me?

Thanks.

gent.spah · July 10, 2023, 8:05am

You can not discard them because they are built in the model, and in different scenarios might not be “0”. If they come close to 0 then then they give no contribution, thats their effect.

paulinpaloalto · July 10, 2023, 3:45pm

It might be worth watching the lectures again. I don’t think Prof Ng uses the term “weight decay” anywhere there. What he does talk about is vanishing and exploding gradients, which make it difficult to successfully train very deep networks, which is to say networks with lots of layers. The innovative technique of Residual Networks is that they use the parallel “skip” connections as an alternate pathway through some of the layers of the network and it turns out that having that has a moderating effect on the training and helps to keep the gradients from vanishing or exploding. As Gent says, you need to keep both connections, because both are part of the network and they work together. As you’ll also hear Prof Ng say in the lectures, the point is not that the goal is to learn the “identity mapping” on the skip connection, since that wouldn’t be a very interesting solution: the point is that having that alternative also participating in the training helps to keeps things “on the rails” meaning that it is more likely that the training will end up converging and giving a useful solution.

Topic		Replies	Views
Sense of ResNet Convolutional Neural Networks coursera-platform	1	492	May 16, 2023
Why do ResNets work? Convolutional Neural Networks coursera-platform	3	513	February 21, 2023
C4 W2 "Why ResNets Work?" Question about the insight Convolutional Neural Networks coursera-platform	1	516	May 27, 2022
I'm confused why RestNet works Convolutional Neural Networks week-module-2 , coursera-platform	3	35	January 24, 2025
Simplifying ResNets after training Convolutional Neural Networks coursera-platform	2	486	November 16, 2022

Why ResNets work? weight decay causes activations to be same

Related topics