Activation functions Lecture Video 4:13

Anbu · May 18, 2021, 10:16am

Dear Mentor,

We cannot understand the intuition behind how when the slope is close to zero, the gradient descent becomes slow down? Usually when the slope is zero means gradient descent converge to global minimum right. Then how gradient descent becomes slow down when the slope of the function is close to zero ? Can u please help to understand this ?

Now, one of the downsides of both the sigmoid function and the tanh function is that if z is either very large or very small,then the gradient or the derivative or the slope of this function becomes very small.So if z is very large or z is very small,
the slope of the function ends up being close to 0.And so this can slow down gradient descent.

Nicolas · May 23, 2021, 10:19am

When a differentiable function is close to its minimum, then its slope is almost 0. But you do not have necessarily the reciprocal as you can see with the sigmoid and tanh functions. For instance with those functions, when the slope goes to 0 it means that your x variable goes to + or - infinity (and neither tanh nor sigmoid have minima or maxima)
When you use gradient descent, you hope that your optimization goes toward a local minimum. Yet, you can encounter troubles with gradient being numerically too small despite the fact that you are far from the minimum you’re looking for. In this case, you will have variables being updated with too low values, and deltas to be backpropagated with values almost zero.

I hope it helps, or do not hesitate to ask further

Anbu · May 25, 2021, 10:52am

Sir Here is my understanding from your points. can u please tell if im understood properly.

Understanding is : Eventhough if the variable x very far away from the minimum, whose gradients are becoming very small early time due to z is very large so the algorithm gradient descent update takes long time to reach the minimum point. Am i right sir? Thats how it

Topic		Replies	Views
Activation function Lecture Video 4:13 Neural Networks and Deep Learning coursera-platform	1	532	June 17, 2021
DL and NN course1 Week#3: Understanding Activation functions Neural Networks and Deep Learning week-3 , coursera-platform	2	31	March 4, 2025
Week 2 Derivatives: Logistic Regression as a neural network Neural Networks and Deep Learning coursera-platform	2	573	September 29, 2021
Why does the activation function's slope matters instead of its log? [Week 3, Activation Function's Video at 4:20] Neural Networks and Deep Learning coursera-platform	2	542	September 1, 2021
Gradient Descent Doubt AI Discussions	7	114	July 11, 2022

Activation functions Lecture Video 4:13

Related topics