Vanishing/Exploding gradients C2W1

what is vanishing/exploding gradients and how to mitigate those??

Hi there

Vanishing gradients occur when the gradients of the parameters of a DNN become so small, that the model learns only very slowly and it seems „nothing“ is happening.

Exploding gradients is describing the opposite situation when the gradients are getting super large, causing e.g. numerical issues.

You can mitigate e.g. w/ the use of activation functions like ReLU, see also this thread:

Further best practices for mitigation include weight initialisation, weight decay and batch normalization to stabilise the activation. It’s also possible to clip the weights w/ bounded optimization or reduce the learning rate if you see gradients exploding. It makes also sense to monitor your gradient flow, see also this thread!

If you want to read more also with respect to additional mitigation techniques, feel free to take a look at this Source.

Best
Christian

1 Like

In addition to Christian’s excellent explanations, note that Prof Ng discusses those topics at several points in the C2 lectures. Have you gotten to the lecture “Vanishing / Exploding Gradients” in C2 Week 1 yet?

1 Like