Local minimum vs Global minimum in the context of Gradient Descent

Alain · June 17, 2022, 11:26pm

How does the Gradient Descent approach reconcile between a local minimum and a global minimum? If we really want the global one, how do we avoid getting stuck locally?

tharunnayak14 · June 18, 2022, 12:28am

Finding a generalized algorithm that always reaches global optima is not possible as far as I know.

I think our first approach should be constructing a loss function which is convex in nature, i.e has no or very less local optima.

Most algorithms like neural networks don’t have a convex loss function, so there is a high chance we may get stuck in a local optima

But even if we get stuck in a local optima there are few ways we can try get out of them,

i) try different initial weights and hope one of them leads to global optima
ii) increase the number of iterations
iii) using stocastic gradient descent as it may help us get out of local optima.

correct me if I’m wrong

Alain · June 18, 2022, 12:43am

Yes this reminds me of the previous edition of the course. The first part of your answer implies trial and error and luck!

Shankar_CK · December 28, 2022, 9:34pm

But, how do we know which is the global minimum? There can ‘n’ local minima, identifying a global minimum out of that could be a never ending task right?

rmwkwok · December 28, 2022, 10:19pm

No, we won’t know until we explicitly compare all of their cost values, which, as you said, could be a never-ending task. So, we seldom or never really target ourselves to that global minimum, instead I think we want a stable and low enough local minimum. A low enough local minimum is one that gives us the best metric performance. The technique to get there is by tuning hyperparameters including initializing our neural network weights differently, then see which hyperparameters configuration gets us the best metric performance on the cv dataset.

Cheers,
Raymond

paulinpaloalto · December 29, 2022, 12:36am

Yes, as Raymond says, finding the actual global minimum is probably not possible, but the higher level point he also makes is that is not what we really want in any case, since it would most likely represent extreme overfitting on the training set. Remember that what we really want is balanced performance on the cross validation and test data, which is not the same data as the training data. Of course we hope that it has a very similar statistical properties, but it is different. Here’s a thread from DLS from a while ago that discusses these issues in more detail.

Topic		Replies	Views
Gradient Descent two local minima Supervised ML: Regression and Classification week-1	5	154	May 12, 2024
Local Optima with Gradient Descent Improving Deep Neural Networks: Hyperparameter tun	1	552	May 30, 2021
C1_W1_Gradient-Descent Supervised ML: Regression and Classification week-1	3	571	July 28, 2022
Local optima in gradient descent Neural Networks and Deep Learning	2	639	March 13, 2022
Query regarding local and global minima Supervised ML: Regression and Classification week-1	2	530	July 1, 2022

Local minimum vs Global minimum in the context of Gradient Descent

Related topics