Hello I am finding it a bit difficult to understand exactly why skip connection helps “the gradient to backpropagate”? And what this means. Could somebody point me to something that explains this in more detail?
Thank you in advance
Hello I am finding it a bit difficult to understand exactly why skip connection helps “the gradient to backpropagate”? And what this means. Could somebody point me to something that explains this in more detail?
Thank you in advance
I think maybe the problem is that they just phrased the question a bit awkwardly. The point is not that skip connections “help the gradient to backpropagate”. The question is what the values of the gradients are and how you keep them from vanishing or exploding. It turns out that adding the skip connections allows you to effectively train much deeper networks than you can without the skip connections because the skip connections help prevent problems with vanishing and exploding gradients. Prof Ng gives some intuition in the lectures for this: he says the presence of the skip path moderates the behavior. It’s easy to start from the Identity function and move in a useful direction.
You might also find it worth taking a look at the original paper on Residual Networks to see if they comment in more detail about the role of the skip connections.
Thank you, and thank you for the link to the original paper, I think I’ll need to read it to properly understand how it prevents vanishing and exploding gradients