Why do I need to use gradient descent method when I can calculate all the possible values of cost function with a predefined step and then search for the minimum? is it efficiency?
Yes, it’s an efficiency question. Bear in mind that the variables here are the “parameters”, meaning all the weights and bias values in your network. Those are the values that you can vary in order to find the minimum cost. In a typical neural network, there may be thousands or even millions or billions of those parameters. So how would you construct your algorithm to enumerate and search all the possible cost values in the case that you have, say, 10 million parameters, each of which is a real number with no limits on its value (meaning a large search space for each of those 10 million values).
The other question you could ask is “why don’t we just take the derivative, set it to zero, and then solve to find the minimum?” That would be a legitimate mathematical approach, but the problem is that it just gives you another equation that is not solvable in closed form. So now you need yet another iterative approximation method to find those zero values of the gradient. That would end up looking like the multidimensional equivalent of Newton-Raphson, which implies that you need the second derivative in order to find the zeros of the first derivative (the gradient). So that just makes the problem more complicated, rather than directly applying Gradient Descent to the cost function.
Great answer. Thanks!