Why don't we use derivate of cost function and make it zero to find local minimum

Imrul_Ahsan · October 24, 2023, 2:27am

To find local minimum of cost function we are using gradient descent. Why don’t we directly find the derivate of the cost function and make it zero? This is straight forward method to find slop of a function(cost function).

TMosh · October 24, 2023, 3:33am

You can do this, but only for one specific form of cost function (the “squared error” cost function that is used in linear regression).

The method is called the Normal equation.

There is no closed form equation for the best weights for any other cost function.

paulinpaloalto · October 24, 2023, 3:38am

As Tom says, there is no “closed form” solution in any other case but that one. It’s a perfectly reasonable question to ask why we don’t do the “set the derivative to zero and solve” in order to find the minimum, but it just turns out that it makes things more complicated, not less, in the case that there are no “closed form” solutions. Setting the derivative to zero and trying to solve just gives you another equation that you have to solve with some type of “iterative approximation” method. In that case, you’d probably use the multidimensional analog of Newton-Raphson to find the zeros of the first derivative. But think about what that means: now you need the second derivative of the cost in order to find the zeros of the first derivative. That’s more work and doesn’t really give you any advantage.

In the cases we are dealing with, it just turns out to be simpler and computationally more efficient to apply Gradient Descent directly to the cost (loss) function itself.

Imrul_Ahsan · November 9, 2023, 3:48am

Perfect, that makes a lot of sense. Thank you !!

VanshKalra · November 16, 2023, 12:11pm

i dont get what is meant by “closed form” equation here.
Is it meant to say there is no absolute cost function as 0 ,thats why we cant apply derivative?
can you please explain it like i am 5 year old , thanks

TMosh · November 16, 2023, 5:17pm

By “no closed form”, we mean there is no direct mathematical solution.

There is only one cost function for which you can set the gradients to 0 and solve for the weights. That’s the linear regression cost function.

No other cost function is a mathematically solvable to directly compute the best weights. So iterative methods are required for these.

Even when a mathematical solution exists, an iterative solution will often require fewer computations and less computer resources.

Topic		Replies	Views
Finding local minima of Cost Function Neural Networks and Deep Learning	2	535	May 25, 2021
Why Gradient Decent is required Neural Networks and Deep Learning	3	653	October 28, 2022
Why don't we get the minimum of a function mathematically instead of running gradient descent? Supervised ML: Regression and Classification week-1	4	556	August 8, 2022
Equate the derivative of cost to 0 zero to get the weight 'w' Supervised ML: Regression and Classification week-1	4	480	June 12, 2023
Gradient descent and derivatives Neural Networks and Deep Learning	2	360	October 6, 2023

Why don't we use derivate of cost function and make it zero to find local minimum

Related topics