Local minimum vs Global minimum in the context of Gradient Descent

Yes, as Raymond says, finding the actual global minimum is probably not possible, but the higher level point he also makes is that is not what we really want in any case, since it would most likely represent extreme overfitting on the training set. Remember that what we really want is balanced performance on the cross validation and test data, which is not the same data as the training data. Of course we hope that it has a very similar statistical properties, but it is different. Here’s a thread from DLS from a while ago that discusses these issues in more detail.

1 Like