Vanishing/Exploding Activations

Max_Rivera · October 31, 2021, 3:16am

As far as I know, when weights are too much smaller or larger than one, the ACTIVATIONS either explode or vanish at some point during the forward pass. So then why don’t we call it ‘vanishing/exploding activations’ instead? When these activations explode or vanish, how does that affect backprop? can we even perform backprop? and are there situations where the activations don’t explode, but the explosion/vanish happens only during backprop?

paulinpaloalto · October 31, 2021, 4:51am

The problem is not that the activation values explode or vanish. It is a problem when the gradients of the activation functions explode or vanish. That is because then it makes problems for the back propagation process: if the gradients vanish, then you can no longer make progress (learn). The gradients are what changes the values and if the gradients are close to zero, then there is very little change. If the gradients explode, then you can’t even converge to a solution.

Of course the gradient of any given activation function is just one factor in the Chain Rule calculation for the gradient of a given weight, but the product of small values is small and the product of large values is large.

Of course it is also the case that the behavior of the gradient values can be connected to the behavior of the underlying activation function: e.g. both sigmoid and tanh flatten out for large values of |z|, so large |z| values cause vanishing gradients with those functions.

Max_Rivera · October 31, 2021, 2:43pm

So in the case of the relu activation function, we would only have to worry about vanishing gradients for values of z < 0?

paulinpaloalto · October 31, 2021, 3:05pm

Exactly! The behavior depends on the activation function. Of course there’s a relatively simple solution for the ReLU case if you hit this problem: switch to Leaky ReLU.

Topic		Replies	Views
The problem of expolding/vanishing Improving Deep Neural Networks: Hyperparameter tun	2	510	March 5, 2022
Vanishing/Exploding Gradients when there is a non-linear activation function Improving Deep Neural Networks: Hyperparameter tun	3	645	January 13, 2023
Vanishing_exploding gradients Improving Deep Neural Networks: Hyperparameter tun	1	528	September 12, 2022
What causing exploding gradients? Improving Deep Neural Networks: Hyperparameter tun	1	590	April 21, 2022
Sigmoid and tanh suffers only with vanishing gradients problem? Improving Deep Neural Networks: Hyperparameter tun	2	790	September 11, 2023

Vanishing/Exploding Activations

Related topics