Gradient descent and derivatives

Hi all,

I am currently watching the gradient descent class and I don’t get why Andrew is not using first and second derivatives of Cost function in order to get global minimum of the function.

Hey @Kaulfield,

Well while the second derivative provides valuable information about the curvature of the cost function, it is often too computationally expensive and unnecessary for many machine learning problems. Gradient descent, which relies on the first derivative, strikes a balance between computational efficiency and effectiveness in finding good parameter values for most practical scenarios.



Right! It’s a natural question to ask “If I want the minimum of J, why don’t I just take the derivative, set it to 0 and solve?” The problem is that just gives you a different numerical approximation problem that is also not “solvable” in closed form. So as Jamal says, you’d need to take the second derivative in order to implement some multidimensional version of Newton-Raphson to find the zeros of the first derivative. It turns out that just adds a bunch of complexity and doesn’t really help. It’s simpler just to do direct Gradient Descent on the cost J.

1 Like