Sigmoid and tanh suffers only with vanishing gradients problem?

bhavanamalla · September 11, 2023, 12:50pm

Since both sigmoid and tanh are saturated functions at both extremes, the networks that use sigmoid(or tanh) suffer only with vanishing gradients but not exploding gradient problems?

Because when the sigmoid or Tanh’s input Z, which is also the activation of the previous layer is either extremely smaller or extremely larger, then the derivative of the sigmoid will always be closer to zero aka smaller. This means, the gradient at the output layer when using a sigmoid activation will always be closer to zero be it exploding or vanishing “activations”. And the sigmoid activation always suffers from a vanishing gradient rather than an exploding gradient?

Can someone help me understand how the exploding gradient problem occurs when using the sigmoid activation function?

Kindly correct me if my comprehension of vanishing gradients is totally wrong.

Thanks in advance

Jamal022 · September 11, 2023, 1:22pm

Hey @bhavanamalla,

Well although the sigmoid function itself does not inherently cause the exploding gradient problem, the problem can still occur in practice, especially in deep networks.

The exploding gradient problem arises when the gradients become extremely large during training. While the sigmoid function doesn’t naturally cause this, other factors in the network can contribute to it. For example:

Initialization: If weights are initialized too large or if you have a poor choice of initialization method, it can lead to gradients that grow exponentially during backpropagation.
Poorly designed architectures: Deep networks with a large number of layers can exacerbate the issue. If the network is not properly designed, it can become more susceptible to exploding gradients.

While the sigmoid function itself doesn’t push gradients to explode, in practice, it can still happen due to these other factors.

I hope it makes sense now.
Cheers!,
Jamal

TMosh · September 11, 2023, 9:35pm

In addition to @Jamal022’s notes:

Another cause of exploding gradients: When the input features have a wide range of values, and you’re using a fixed learning rate. It can be difficult to find a learning rate that is large enough to converge in a reasonable amount of time but isn’t so large as to cause their gradients to explode.

The fix for this is to normalize the features.

Topic		Replies	Views
The problem of expolding/vanishing Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	510	March 5, 2022
What causing exploding gradients? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	591	April 21, 2022
Vanishing/Exploding Gradients when there is a non-linear activation function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	647	January 13, 2023
Having large wight matrix will lead to vanishing gradient or exploding gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	460	June 9, 2023
Vanishing/Exploding Activations Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	574	October 31, 2021

Sigmoid and tanh suffers only with vanishing gradients problem?

Related topics