Hi there
Vanishing gradients occur when the gradients of the parameters of a DNN become so small, that the model learns only very slowly and it seems „nothing“ is happening.
Exploding gradients is describing the opposite situation when the gradients are getting super large, causing e.g. numerical issues.
You can mitigate e.g. w/ the use of activation functions like ReLU, see also this thread:
Further best practices for mitigation include weight initialisation, weight decay and batch normalization to stabilise the activation. It’s also possible to clip the weights w/ bounded optimization or reduce the learning rate if you see gradients exploding. It makes also sense to monitor your gradient flow, see also this thread!
If you want to read more also with respect to additional mitigation techniques, feel free to take a look at this Source.
Best
Christian