Local Optima with Gradient Descent

bgoyal · May 29, 2021, 7:10am

In the week 2 lecture regarding local optima, Dr. Ng says that at points where the derivative is 0 the function can be concave or convex.

I assume that he’s referring to a 2D cross section when he says this, and if so, why can’t that cross section have the point as a point of inflection (so the function would be increasing on one side and decreasing on the other)? Thanks in advance!

paulinpaloalto · May 30, 2021, 9:06pm

You’re right that there are more possibilities than just concave or convex. It can also be “none of the above”. But there are two levels to the point here: 1) we really only want to find the local optima where the surface is convex and 2) these are all “local” optima that we are finding, meaning that there is no way to guarantee that the point we converge to with Gradient Descent is the global minimum. There is a lot of deep math here that Prof Ng doesn’t have time to cover and he also specifically has designed these courses to be accessible without requiring grad school level math background. He does make the comment that in “real life” situations the problem of finding local minima turns out not to be that serious an issue.

The reasons for this also involve math beyond the scope here, but there are a couple of things that can be said about this that may help even without getting into the actual math:

The global versus local minimum issue is really not a problem: even if we were able to find the real global minimum of the cost on the training set, that probably would not be a good solution because it would represent very serious “overfitting” on the training data.
There has been some really interesting work from Prof Yann LeCun’s group that shows that for networks with large enough numbers of parameters, it turns out that the local minima are constrained into a fairly narrow band that actually does represent pretty good solutions. Here’s the paper. I don’t claim to have actually read and understood it, but you can get the gist just by reading the abstract.

Topic		Replies	Views
Local optima in gradient descent Neural Networks and Deep Learning	2	639	March 13, 2022
Local minimum vs Global minimum in the context of Gradient Descent Supervised ML: Regression and Classification week-1	5	759	December 29, 2022
Understanding of local optima in deep networks Improving Deep Neural Networks: Hyperparameter tun	4	600	April 28, 2023
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-2	2	828	December 3, 2022
Cost function stuck at local minima Neural Networks and Deep Learning	8	1443	July 5, 2024

Local Optima with Gradient Descent

Related topics