How does gradient descent escape from a saddle point? (And what is random perturbation)?

David_Farago · June 18, 2022, 5:39pm

In the last lecture of DLS Course 2 week 2 about the problems of local optima, Andrew argues that local minima are not a problem because they are much rarer than saddle points, and then depicts how gradient descent enters and then exists a saddle point.

The saddle point is exited “because of a random perturbation”. What does that mean?

I could not find anything about this anywhere – besides Why saddle points isn't a problem for gradient descent? and Cost function stuck at local minima - #3 by kenb, which are also helpful but do not answer my question.

Is it the case that the saddle point is not entered precisely, so being an epsilon away from the saddle point is sufficient to “exit” it again after sufficiently many gradient descent iterations? Or maybe even when entering the saddle point exactly simply due to numerical imprecision? Or is imprecision caused on purpose by adding small noise, for the purpose of exiting saddle points?

Nicolas · June 18, 2022, 6:28pm

Your guesses are right.

It is really unlikely, as you said, that an iteration drives you precisely on the point where the derivatives are all exactly zero. So you will more likely slide on one side or the other of the saddle point (in a 2 dimensional picture).

The higher the number of variables you have, the lower the chances that, on a critical point, each principal direction (that you obtain by diagonalizing the Hessian, assuming it is non degenerate) correspond identicaly to a maximum or a minimum along this one

Hope that helps!

David_Farago · June 19, 2022, 6:56am

Thanks a lot Nicolas.

Is that what Andrew meant by “random perturbation”, or is there another aspect to it (e.g. imprecision on purpose or by accident)?

Nicolas · June 19, 2022, 9:18am

I think it’s that. Random perturbation during optimization would be that there is always any little numerical value added to a theoreticaly expected one

Topic		Replies	Views
Why saddle points isn't a problem for gradient descent? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	1230	April 11, 2022
Confused on Saddle Points Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	519	August 19, 2023
Saddle Point clarification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	571	July 19, 2021
Gradient descent fails at local maximum initial values? Supervised ML: Regression and Classification week-1	2	554	June 26, 2022
Doubt regarding a potential limitation of gradient descent Supervised ML: Regression and Classification week-1	5	101	June 10, 2024

How does gradient descent escape from a saddle point? (And what is random perturbation)?

Related topics