Exactly. Finding the overall global minimum is actually not what you want in any case, because it would very likely represent extreme overfitting on the training data. It turns out that for sufficiently complex networks, there is a band of local minimum values that are reasonable solutions and are numerous enough that you have a reasonable chance of finding them. This statement is based on work from Yann LeCun’s research group and the paper is linked from this other thread which discusses this same question.
2 Likes