Why cant we just set the derivative of cost function to zero to find the local minima, instead of using the gradient descent algorithm?
It’s a natural question, but it turns out doing that doesn’t help: it just makes the problem more complicated. The reason is that just gives you an equation that you can’t solve in “closed form”, so then you need yet another type of iterative approximation method to estimate the zeros of the derivative. E.g. something like the multi-dimensional analog of Newton-Raphson for approximating the zeros of a univariate function, but that would involve taking the second (partial) derivatives of the cost. It’s simpler just to apply Conjugate Gradient methods to the cost function directly.
Thanks! I appreciate the help.