Why don't we get the minimum of a function mathematically instead of running gradient descent?

suppose that our cost function which is J(w, b)

Why can’t we just set J’(w, b) = 0

setting the derivative of the cost function to zero, getting all the values which satisfies the equation and choosing the smallest one

If its a low degree polynomial that might be easy to do but of its a complex multidimensional function its very hard to achieve. Also computing gradients in that way i.e. finding solutions to equations might be more computationally expensive than the cost approach.

That method is called the Normal equation.

It is only practical for small data sets due to the computational complexity.


You might find this article useful:

1 Like

A great article. Thank you!