Activation functions Lecture Video 4:13

Dear Mentor,

We cannot understand the intuition behind how when the slope is close to zero, the gradient descent becomes slow down? Usually when the slope is zero means gradient descent converge to global minimum right. Then how gradient descent becomes slow down when the slope of the function is close to zero ? Can u please help to understand this ?

Now, one of the downsides of both the sigmoid function and the tanh function is that if z is either very large or very small,then the gradient or the derivative or the slope of this function becomes very small.So if z is very large or z is very small,
the slope of the function ends up being close to 0.And so this can slow down gradient descent.

When a differentiable function is close to its minimum, then its slope is almost 0. But you do not have necessarily the reciprocal as you can see with the sigmoid and tanh functions. For instance with those functions, when the slope goes to 0 it means that your x variable goes to + or - infinity (and neither tanh nor sigmoid have minima or maxima)
When you use gradient descent, you hope that your optimization goes toward a local minimum. Yet, you can encounter troubles with gradient being numerically too small despite the fact that you are far from the minimum you’re looking for. In this case, you will have variables being updated with too low values, and deltas to be backpropagated with values almost zero.

I hope it helps, or do not hesitate to ask further :slight_smile:

Sir Here is my understanding from your points. can u please tell if im understood properly.

Understanding is : Eventhough if the variable x very far away from the minimum, whose gradients are becoming very small early time due to z is very large so the algorithm gradient descent update takes long time to reach the minimum point. Am i right sir? Thats how it