As Tom says, there is no “closed form” solution in any other case but that one. It’s a perfectly reasonable question to ask why we don’t do the “set the derivative to zero and solve” in order to find the minimum, but it just turns out that it makes things more complicated, not less, in the case that there are no “closed form” solutions. Setting the derivative to zero and trying to solve just gives you another equation that you have to solve with some type of “iterative approximation” method. In that case, you’d probably use the multidimensional analog of Newton-Raphson to find the zeros of the first derivative. But think about what that means: now you need the second derivative of the cost in order to find the zeros of the first derivative. That’s more work and doesn’t really give you any advantage.

In the cases we are dealing with, it just turns out to be simpler and computationally more efficient to apply Gradient Descent directly to the cost (loss) function itself.