About local minimum in NN


I’m learning Machine Learning right now, since the bias and weights are randomly initialized at the begining of training Neural Network, is it possible that with gradient descent optimization, the cost function gets stuck at a local minimum instead of global minimum? I think this is not mentioned in the class?


All NNs can find a local minimum. But it is not because of the initialization. It is because the NN cost function is not convex.

thanks, I understand that cost function is not convex, so in a unlucky case it got stuck in local minima, but how do we avoid this? train multiple times? cuz with gradient descent, I think it matters with initial weight and bias values(the starting point), and I feel like if we try multiples with random initialization, then it could be avoided, what’s the most efficient and effective way to avoid this issue?

Yes, can train multiple times. Often you do not really need the optimum minimum, just one that is good enough.

1 Like

Exactly. Finding the overall global minimum is actually not what you want in any case, because it would very likely represent extreme overfitting on the training data. It turns out that for sufficiently complex networks, there is a band of local minimum values that are reasonable solutions and are numerous enough that you have a reasonable chance of finding them. This statement is based on work from Yann LeCun’s research group and the paper is linked from this other thread which discusses this same question.


Thank you for you detailed explanation and the link to the other thread! Now I get it, I really appreciate it!

This was a super helpful reply, thanks!

1 Like