what will happen if the cost function has many local optima or it is not a convex function then what will the gradient descent do ? will it be stuck at one local optima or will it reach global optima ?
The solution will depend on the choice of initial solution and learning rate. For a sufficiently small learning rate, GD will stuck into a local optima depending on the initial soluton.
1 Like
It’s an interesting and important question and the answers are not straightforward:
The cost functions for neural networks are not convex. There is no guarantee that you’ll ever find a global minimum, but that may not be desirable in any case: most likely it would represent extreme overfitting on the training data. If you choose your gradient descent algorithm and parameters correctly, you can usually find one of the very many local minima that give a reasonable solution.
Here’s a thread which discusses this in more detail.
1 Like