Activation function Lecture Video 4:13

Dear Mentor,

We cannot understand the intuition behind how when the slope is close to zero, the gradient descent becomes slow down? Usually when the slope is zero means gradient descent converge to global minimum right. Then how gradient descent becomes slow down when the slope of the function is close to zero ? Can u please help to understand this ?

Now, one of the downsides of both the sigmoid function and the tanh function is that if z is either very large or very small,then the gradient or the derivative or the slope of this function becomes very small.So if z is very large or z is very small,
the slope of the function ends up being close to 0.And so this can slow down gradient descent.

W and b are updated with the values of dW and db. dW and db are derivatives. If you look at the curve given in the course, you see that it’s almost horizontal for very small or very large values. This means that the derivatives for these values are almost zero (very close to zero). If you update W and b with very low values of dW and db, you’ll find values close to those of W and b. Your algorithm will only move very slowly.

If you look at the curve, you will also see that it is not convex. In this case, I think you have to distinguish between different aspects of the course.