Why Gradient Decent is required

prtata · May 18, 2021, 12:27pm

When we can differentiate the cost function and find parameters by solving equations obtained through partial differentiation with respect to every parameter and find out where the cost function is minimum. Also I think its possible to find multiple places where the derivatives are zero, thereby we can check for all such places and can find global minima

why is gradient descent performed instead?

GordonRobinson · May 18, 2021, 2:18pm

@prtata, that’s a good question. We all wrestle with it for a time.

If we could find a closed-form, solvable form for the derivatives that we could solve to find where they are zero, we would do that. But such forms don’t exist for anything realistic.
We need an iterative mechanism, and gradient descent (with its many variations you’ll see in course 2) is a good choice for doing that iteration. The general approach of finding a loss/cost function and then minimizing that cost/loss through gradient descent is used in many machine learning methods.

carlosrl · October 28, 2022, 11:51am

My cents.
In school, when we want to find the minimum of a function whose gradient can be calculated, we normally don’t apply GD. Instead, we just find the minimum analytically by directly solving for when the gradient is 0. But in the scenario where we have complex calculations, and consider optimization, GD is really faster to solve our problem!

paulinpaloalto · October 28, 2022, 4:13pm

Exactly: the “set the derivative to zero and solve” method doesn’t help. It just makes things more complicated, because you now have another equation that can’t be solved in closed form (as Gordon pointed out). So you need another “iterative approximation” method like the multidimensional equivalent of Newton-Raphson. But that means you need the second derivatives of the cost if you think about it. So it’s just making the problem more complicated. Doing direct GD on the cost is more straightforward.

The OP is correct that we have to worry about local minima, saddle points and the like, but it just turns out that the mathematics work in our favor here. There is a paper from Yann LeCun’s group which shows that for networks that are sufficiently complex, there is a range of good solutions that can be found by GD.

Topic		Replies	Views
Gradient Descent vs Searching for Minimum AI Discussions	2	70	July 17, 2022
Gradient descent and derivatives Neural Networks and Deep Learning	2	360	October 6, 2023
Why don't we use derivate of cost function and make it zero to find local minimum Supervised ML: Regression and Classification week-1	5	518	November 16, 2023
Finding local minima of Cost Function Neural Networks and Deep Learning	2	535	May 25, 2021
What if the cost function or target function is not differentiable? Improving Deep Neural Networks: Hyperparameter tun	3	578	November 23, 2021

Why Gradient Decent is required

Related topics