Vanishing gradients can occur when the derivatives of activations are small. For at least functions like sigmoid and tanh, the derivatives are small for large values of |z|. But what weights / activation functions will cause exploding gradients? The derivatives of common activation functions like sigmoid, tanh, and ReLU never exceed 1. So I don’t see how it’s possible to achieve exploding gradients regardless of what values of |z| we have.
2 Likes
Hi, @Max_Rivera:
When computing the gradients, the derivatives of the activation functions are not the only terms that come up. Let me know if this explanations helps (check the links too).
Good luck with the specialization