I don’t under how a[l+1] is really skipped in this example:
In the example above we go straight from a[l] to a[l+2] thus skipping a[l+1].
but as shown on the formula below,
in order to calculate a[l+2] we do have to pass through the calculation of a[l+1].
So, it seems to me as if we are actually
not skipping the layer l+1 after all. In what way are we skipping l+1?
Actually, “a[l]” skips a[l+1] layer, But in the algorithm for calculating a[l+2] we use both a[l+1] and a[l]. Note that the optimization algorithm can make W[l+2]=0 so practically a[l+1] can be removed for calculating a[l+2].
Thank you. So what I don’t understand is: If we use a[l+1] for calculation, how come we still say that we skip a[l+1]?
Hi @Doron_Modan
When we calculate the a[l+2] we regularly depending on a[l+1] only and gradient descent algorithm automatically tune w and b weights according to cost function but in the skip connection a[l+2] depending on a[l+1] also a[l] and to predict the value of this layer …the gradient descent algorithm also tune w and b and the benefit of the skip connection is if your algorithm suffer from over fitting the gradient descent set the value of the w close to zero so the value of this layer will depending on the a[l] and if your algorithm didn’t suffer from over fitting the value of this layer will depend on a[l+1] also a[l] as w and b wouldn’t be close to zero and that help the algorithm to to calculate more complex calculation and extract more high level features and the algorithm will always converge or stand and will not diverge of the so the skip connection have 2 benefits one of them is tune the over fitting of the algorithm and make sure your algorithm is always converge or stand(be fixed wouldn’t diverge)
Thanks!
Abdelrahman
The new technique that ResNets introduce is that there are two paths: one that skips a[l + 1] and one that doesn’t. The compute graph is no longer simply connected. Of course that affects both forward and backward propagation, as the other replies have described.
2 Likes