Confused on Saddle Points

Stephano_Cotsoradis · August 19, 2023, 4:44pm

I am having trouble trying to visualize the difference the difference between saddle points and local optima. I am watching the video “The problem of local Optima”. The examples shown in the video of saddle point and local optima seem the same to me. Does anyone have a visual or real world example that I can better picture this ?

paulinpaloalto · August 19, 2023, 5:40pm

The picture is pretty easy to see, as shown in the lecture. The question is what does Gradient Descent do when it hits or gets close to a saddle point. My interpretation of the point here is that from a saddle point or a local maximum, there are still directions in which the gradient is negative, right? Whereas that is not true for a local minimum. So gradient descent should be able to move on and not get stuck, unless you happen to get incredibly unlucky and exactly land on the point at which the gradients are all zero. But one hopes the probability of that is pretty low …

In terms of actually visualizing any of this stuff in the real way it is happening, it’s just hopeless. Meaning that we’re typically dealing with literally hundreds of dimensions at a minimum. And it’s not at all unusual to have thousands or even millions of parameters, right? And if you really want to get crazy, they claim that GPT4 has 1.7 trillion parameters. What do things look like in 1.7 trillion dimensional space?

So we’re left with just visualizing things in 3D, which is all our human brains can handle. That corresponds to 2 parameters, which is pretty pathetic, but we hope that the intuition I stated in the first paragraph still applies.

FWIW here’s a paper from Yann LeCun’s group, which has some math showing that for sufficiently complex models, there exist lots of reasonable solutions in terms of local minima. I don’t claim to understand the math, but please have a look. They do show some nice 3D pictures. Here’s a thread with some more discussion in addition to the link to that paper and some other links that may be worth a look.

Stephano_Cotsoradis · August 19, 2023, 8:03pm

Thanks for clearing it up for me, it makes sense now !

Topic		Replies	Views
Saddle Point clarification Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	571	July 19, 2021
Why saddle points isn't a problem for gradient descent? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	1230	April 11, 2022
Local Optima with Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	553	May 30, 2021
Understanding of local optima in deep networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	600	April 28, 2023
How does gradient descent escape from a saddle point? (And what is random perturbation)? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	636	June 19, 2022

Confused on Saddle Points

Related topics