The problem of expolding/vanishing

user335 · March 3, 2022, 7:34pm

HI
why would i have the problem of exploding/vanishing in the deep network if i use sigmoid or tanh activation functions? These types of functions doesn’t matter how large your input it will reset it to value between 0 and 1 so it doesn’t affect on the next layer. i think the problem with exploding/vanishing would occur in the activation functions that doesn’t squash the input to values between 0 and 1 ,like the relu. the only thing affect is when we have many neurons per layer the value of Z will be big.

paulinpaloalto · March 4, 2022, 12:48am

The “exploding” or “vanishing” behavior is not referring to the actual values of the activation functions: it is referring to the gradients of the parameters with respect to the cost. Of course (by the Chain Rule) those are the products of lots of gradients, including the gradients of the activation functions. Notice that the “tails” of the sigmoid and tanh functions flatten out pretty agressively as |z| increases. That means the derivatives are close to 0. The product of numbers << 1 gets smaller, right? E.g. 0.1 * 0.1 = 0.01. Whereas the products of numbers > 1 get larger.

If the gradients are close to 0 (“vanishing”) that means Gradient Descent is “stuck” and can’t learn or at least can’t learn very fast. If the gradients are really large numbers, then you get divergence or oscillation instead of convergence.

user335 · March 5, 2022, 8:43am

that is great

. thank you

Topic		Replies	Views
Sigmoid and tanh suffers only with vanishing gradients problem? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	798	September 11, 2023
Vanishing/Exploding Gradients when there is a non-linear activation function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	647	January 13, 2023
Vanishing/Exploding Activations Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	574	October 31, 2021
What causing exploding gradients? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	591	April 21, 2022
Having large wight matrix will lead to vanishing gradient or exploding gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	460	June 9, 2023

The problem of expolding/vanishing

Related topics