Gradient Descent vs Newton's Method

It’s an interesting question that comes up pretty frequently. If you took calculus, you remember that to find an extremum, you can set the derivative to zero and solve. But that also gives you an equation with no closed form solution, so you have to resort to something like Newton-Raphson in multiple dimensions. But think about what that means: now you need the second derivative of the cost surface in order to find the zeros of the first derivative. It just ends up making things more complicated and doesn’t really give you any advantage.

Here’s a previous discussion of this point in the context of MLS. Here’s one from DLS. You can find more by searching for “Newton” or “Newton-Raphson”.