I’m learning Machine Learning right now, since the bias and weights are randomly initialized at the begining of training Neural Network, is it possible that with gradient descent optimization, the cost function gets stuck at a local minimum instead of global minimum? I think this is not mentioned in the class?
thanks, I understand that cost function is not convex, so in a unlucky case it got stuck in local minima, but how do we avoid this? train multiple times? cuz with gradient descent, I think it matters with initial weight and bias values(the starting point), and I feel like if we try multiples with random initialization, then it could be avoided, what’s the most efficient and effective way to avoid this issue?
Exactly. Finding the overall global minimum is actually not what you want in any case, because it would very likely represent extreme overfitting on the training data. It turns out that for sufficiently complex networks, there is a band of local minimum values that are reasonable solutions and are numerous enough that you have a reasonable chance of finding them. This statement is based on work from Yann LeCun’s research group and the paper is linked from this other thread which discusses this same question.