Saddle Point clarification

Anbu · July 17, 2021, 3:31pm

Hi Sir,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha

We had couple of doubts sir. Can u please help to clarify it ?

In very high dimensional space of parameters, do we have multiple saddle points or we end with always only one saddle point ?
Why the algorithm need to get off the plateau region ? Because at the saddle point gradient is zero, so it means we reached the global minimum. If so we reached the global minimum means why the algorithm needs to get off the plateau region ?

paulinpaloalto · July 18, 2021, 3:29am

It sounds like you don’t understand what is meant by a “saddle point”. Here’s the Wikipedia article on the subject. The point (pun partially intended) is that a saddle point is not a local extremum at all: it is a point on the cost surface at which the gradient is zero, but it is neither a local minimum or a local maximum. So finding this point actually does us no good, which is why it is important to move off this region.

In the very high dimensional spaces that we are dealing with there are very large numbers of local extrema and saddle points. But it turns out that most local extrema that we find with Gradient Descent are likely to be decent solutions. The math behind this is not simple, but here is another thread that discusses the general question of non-convexity which also refers to the paper from Yann LeCun’s group that proves this.

Anbu · July 19, 2021, 8:34am

Thank You very much Sir

Topic		Replies	Views
Confused on Saddle Points Improving Deep Neural Networks: Hyperparameter tun	2	516	August 19, 2023
Why saddle points isn't a problem for gradient descent? Improving Deep Neural Networks: Hyperparameter tun	2	1184	April 11, 2022
How does gradient descent escape from a saddle point? (And what is random perturbation)? Improving Deep Neural Networks: Hyperparameter tun	3	632	June 19, 2022
Understanding of local optima in deep networks Improving Deep Neural Networks: Hyperparameter tun	4	597	April 28, 2023
Local Optima with Gradient Descent Improving Deep Neural Networks: Hyperparameter tun	1	551	May 30, 2021

Saddle Point clarification

Related topics