Vanishing/Exploding Gradients when there is a non-linear activation function

Christian_Simonis · January 13, 2023, 8:49pm

welcome to the community!

Yes, your statement in the 2nd post is true: also in non-linear activation function one can have vanishing gradients.
Let’s take sigmoid or tanh. Here you have this risk because they are saturating / flattening out in the sides which makes the gradient “vanish”.

e.g. for ReLU there is a reduced risk of vanishing gradients since the gradient in the positive section of the ReLU function is constant. It does not saturate in contrast to sigmoid or tanh, see also: Activation functions

In addition: also exploding gradients can occur for non-linear activation functions. This might be influenced by badly chosen hyperparameters, see also the links below.

Here you find some mitigation strategies, like gradient clipping or approaches and others:

Please let me know if anything is unclear, @spather and don’t hesitate to ask.

Best regards
Christian

Topic		Replies	Views
The problem of expolding/vanishing Improving Deep Neural Networks: Hyperparameter tun	2	507	March 5, 2022
Vanishing/Exploding Activations Improving Deep Neural Networks: Hyperparameter tun	3	571	October 31, 2021
What causing exploding gradients? Improving Deep Neural Networks: Hyperparameter tun	1	587	April 21, 2022
Sigmoid and tanh suffers only with vanishing gradients problem? Improving Deep Neural Networks: Hyperparameter tun	2	691	September 11, 2023
Vanishing_exploding gradients Improving Deep Neural Networks: Hyperparameter tun	1	521	September 12, 2022

Vanishing/Exploding Gradients when there is a non-linear activation function

Related topics