To find local minimum of cost function we are using gradient descent. Why don’t we directly find the derivate of the cost function and make it zero? This is straight forward method to find slop of a function(cost function).

You can do this, but only for one specific form of cost function (the “squared error” cost function that is used in linear regression).

The method is called the Normal equation.

There is no closed form equation for the best weights for any other cost function.

As Tom says, there is no “closed form” solution in any other case but that one. It’s a perfectly reasonable question to ask why we don’t do the “set the derivative to zero and solve” in order to find the minimum, but it just turns out that it makes things more complicated, not less, in the case that there are no “closed form” solutions. Setting the derivative to zero and trying to solve just gives you another equation that you have to solve with some type of “iterative approximation” method. In that case, you’d probably use the multidimensional analog of Newton-Raphson to find the zeros of the first derivative. But think about what that means: now you need the second derivative of the cost in order to find the zeros of the first derivative. That’s more work and doesn’t really give you any advantage.

In the cases we are dealing with, it just turns out to be simpler and computationally more efficient to apply Gradient Descent directly to the cost (loss) function itself.

Perfect, that makes a lot of sense. Thank you !!

i dont get what is meant by “closed form” equation here.

Is it meant to say there is no absolute cost function as 0 ,thats why we cant apply derivative?

can you please explain it like i am 5 year old , thanks

By “no closed form”, we mean there is no direct mathematical solution.

There is only one cost function for which you can set the gradients to 0 and solve for the weights. That’s the linear regression cost function.

No other cost function is a mathematically solvable to directly compute the best weights. So iterative methods are required for these.

Even when a mathematical solution exists, an iterative solution will often require fewer computations and less computer resources.