In the video, Andrew defined a function f(a). Isn’t derivative a slope of the tangent at a given point on the curve. So by that logic, isn’t the derivative of f(a) on that graph would be 0 and we cannot draw tangent and hence cannot find the derivative of the function? Any thoughts on this, what am I missing?
The derivative of the sigmoid activation function is not zero for small input values. However, as you see, for values far from zero, the gradient approaches zero. With too small gradients, learning stops, which is one reason why the ReLU activation function is so popular.
Hey, I’m not talking about the derivative of the sigmoid function. In the week 2 video name Derivatives, Andrew gave an example as f(a) = 3a and he computed the derivative of this function.
I cleared my doubts though. The derivative of such a function turns out to be constant. so for any change in value ‘a’, the slope would always be constant.
Thanks for the reply.