Residual Connection - Exploding Gradients

Hey there!

In the lecture Why Residual Nets work?, Professor Ng mentioned that the vanishing and exploding gradients and how this residual connection makes the solution to alleviate these two problems.

As I understand how adding vanishing gradients (a[l+2]) with the previous layer (a[l]) can solve the problem for vanishing gradients - because the gradients of the previous layer (a[l]) are still non-zero and help the model learn.

I am having a hard time understanding how adding residual connections help with exploding gradients. Adding those two matrices will simply output an output matrix with really large values.

I would appreciate any help :blush:


Hi bardh,

You can have a look here. The idea is that skip connections simplify the network mathematically, and that in result the exploding gradient problem is circumvented.

1 Like