Gradient descent and derivatives

Kaulfield · October 6, 2023, 10:25am

Hi all,

I am currently watching the gradient descent class and I don’t get why Andrew is not using first and second derivatives of Cost function in order to get global minimum of the function.
Thanks.

Jamal022 · October 6, 2023, 10:46am

Hey @Kaulfield,

Well while the second derivative provides valuable information about the curvature of the cost function, it is often too computationally expensive and unnecessary for many machine learning problems. Gradient descent, which relies on the first derivative, strikes a balance between computational efficiency and effectiveness in finding good parameter values for most practical scenarios.

Regards,
Jamal

paulinpaloalto · October 6, 2023, 3:22pm

Right! It’s a natural question to ask “If I want the minimum of J, why don’t I just take the derivative, set it to 0 and solve?” The problem is that just gives you a different numerical approximation problem that is also not “solvable” in closed form. So as Jamal says, you’d need to take the second derivative in order to implement some multidimensional version of Newton-Raphson to find the zeros of the first derivative. It turns out that just adds a bunch of complexity and doesn’t really help. It’s simpler just to do direct Gradient Descent on the cost J.

Topic		Replies	Views
Why Gradient Decent is required Neural Networks and Deep Learning coursera-platform	3	656	October 28, 2022
Gradient Descent vs Searching for Minimum AI Discussions	2	70	July 17, 2022
Finding local minima of Cost Function Neural Networks and Deep Learning coursera-platform	2	535	May 25, 2021
Why don't we use derivate of cost function and make it zero to find local minimum Supervised ML: Regression and Classification week-1	5	530	November 16, 2023
Gradient Descent vs Newton's Method AI Discussions	2	260	January 21, 2024

Gradient descent and derivatives

Related topics