Why Gradient Decent is required

paulinpaloalto · October 28, 2022, 4:13pm

Exactly: the “set the derivative to zero and solve” method doesn’t help. It just makes things more complicated, because you now have another equation that can’t be solved in closed form (as Gordon pointed out). So you need another “iterative approximation” method like the multidimensional equivalent of Newton-Raphson. But that means you need the second derivatives of the cost if you think about it. So it’s just making the problem more complicated. Doing direct GD on the cost is more straightforward.

The OP is correct that we have to worry about local minima, saddle points and the like, but it just turns out that the mathematics work in our favor here. There is a paper from Yann LeCun’s group which shows that for networks that are sufficiently complex, there is a range of good solutions that can be found by GD.

Topic		Replies	Views
Gradient Descent vs Searching for Minimum AI Discussions	2	70	July 17, 2022
Gradient descent and derivatives Neural Networks and Deep Learning	2	360	October 6, 2023
Why don't we use derivate of cost function and make it zero to find local minimum Supervised ML: Regression and Classification week-1	5	510	November 16, 2023
Finding local minima of Cost Function Neural Networks and Deep Learning	2	534	May 25, 2021
What if the cost function or target function is not differentiable? Improving Deep Neural Networks: Hyperparameter tun	3	572	November 23, 2021

Why Gradient Decent is required

Related topics