Gradient Descent vs Newton's Method

Alex_Suarez · January 21, 2024, 5:51pm

Do practitioners ever use Newton’s method to minimize cost function?
(or does this method only work for ‘nice’ functions?)
I would imagine that Newton’s method would converge in the fewest number of steps to the minimum vs finding optimal learning rates for GD.

paulinpaloalto · January 21, 2024, 6:14pm

It’s an interesting question that comes up pretty frequently. If you took calculus, you remember that to find an extremum, you can set the derivative to zero and solve. But that also gives you an equation with no closed form solution, so you have to resort to something like Newton-Raphson in multiple dimensions. But think about what that means: now you need the second derivative of the cost surface in order to find the zeros of the first derivative. It just ends up making things more complicated and doesn’t really give you any advantage.

Here’s a previous discussion of this point in the context of MLS. Here’s one from DLS. You can find more by searching for “Newton” or “Newton-Raphson”.

TMosh · January 21, 2024, 7:26pm

Not in my experience.

Topic		Replies	Views
Gradient descent and derivatives Neural Networks and Deep Learning	2	360	October 6, 2023
Newthon's method - Multiple Initial Points Calculus for Machine Learning and Data Science week-3	3	322	October 5, 2023
Gradient Descent vs Searching for Minimum AI Discussions	2	70	July 17, 2022
Why Gradient Decent is required Neural Networks and Deep Learning	3	653	October 28, 2022
Finding local minima of Cost Function Neural Networks and Deep Learning	2	535	May 25, 2021

Gradient Descent vs Newton's Method

Related topics