Skipping in resnets

Doron_Modan · December 27, 2022, 12:44pm

I don’t under how a[l+1] is really skipped in this example:

In the example above we go straight from a[l] to a[l+2] thus skipping a[l+1].
but as shown on the formula below,

in order to calculate a[l+2] we do have to pass through the calculation of a[l+1].
So, it seems to me as if we are actually not skipping the layer l+1 after all. In what way are we skipping l+1?

Mehrab_Zamanian · December 27, 2022, 2:06pm

Actually, “a[l]” skips a[l+1] layer, But in the algorithm for calculating a[l+2] we use both a[l+1] and a[l]. Note that the optimization algorithm can make W[l+2]=0 so practically a[l+1] can be removed for calculating a[l+2].

Doron_Modan · December 27, 2022, 3:17pm

Thank you. So what I don’t understand is: If we use a[l+1] for calculation, how come we still say that we skip a[l+1]?

AbdElRhaman_Fakhry · December 27, 2022, 4:23pm

Hi @Doron_Modan

When we calculate the a[l+2] we regularly depending on a[l+1] only and gradient descent algorithm automatically tune w and b weights according to cost function but in the skip connection a[l+2] depending on a[l+1] also a[l] and to predict the value of this layer …the gradient descent algorithm also tune w and b and the benefit of the skip connection is if your algorithm suffer from over fitting the gradient descent set the value of the w close to zero so the value of this layer will depending on the a[l] and if your algorithm didn’t suffer from over fitting the value of this layer will depend on a[l+1] also a[l] as w and b wouldn’t be close to zero and that help the algorithm to to calculate more complex calculation and extract more high level features and the algorithm will always converge or stand and will not diverge of the so the skip connection have 2 benefits one of them is tune the over fitting of the algorithm and make sure your algorithm is always converge or stand(be fixed wouldn’t diverge)

Thanks!
Abdelrahman

paulinpaloalto · December 27, 2022, 4:54pm

The new technique that ResNets introduce is that there are two paths: one that skips a[l + 1] and one that doesn’t. The compute graph is no longer simply connected. Of course that affects both forward and backward propagation, as the other replies have described.

Topic		Replies	Views
ResNet Question: How can it skip "some Layers" of the NN given that each Layer needs the output of the previous Layer? Convolutional Neural Networks	1	535	May 20, 2021
ResNets about Skip Connection Convolutional Neural Networks	6	572	June 13, 2022
Question about residual blocks and skipped connections Convolutional Neural Networks	6	590	December 5, 2022
RESNET Explanation Convolutional Neural Networks	1	481	August 26, 2022
Why do ResNets work? Convolutional Neural Networks	3	513	February 21, 2023

Skipping in resnets

Related topics