Why cant we just set the derivative of cost function to zero to find the local minima, instead of using the gradient descent algorithm?

Itâ€™s a natural question, but it turns out doing that doesnâ€™t help: it just makes the problem more complicated. The reason is that just gives you an equation that you canâ€™t solve in â€śclosed formâ€ť, so then you need yet another type of iterative approximation method to estimate the zeros of the derivative. E.g. something like the multi-dimensional analog of Newton-Raphson for approximating the zeros of a univariate function, but that would involve taking the second (partial) derivatives of the cost. Itâ€™s simpler just to apply Conjugate Gradient methods to the cost function directly.

1 Like

Thanks! I appreciate the help.