Wandering Around Minimum

Anbu · July 19, 2021, 8:47am

Hi Mentors,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha @nramon

we had couple of doubts. Can you please help to answer ?

Suppose if the algorithm wandering around the minimum means, not able to converge means, how i can know that practically such that the algorithm wandering around the minimum in high dimensional space (like plot graphs or based on some value) ?
Suppose if the algorithm stuck in bad local optima means, how we can come to know practically ? Here is my answer for it. Is it correct ? if stuck in local optima means gradients will not be close to zero.

paulinpaloalto · July 19, 2021, 1:57pm

You can tell if it is not converging because the cost does not go down any more. In that case, try a different set of random initial conditions.

Generally speaking, if the cost stabilizes, then you have a solution. Whether that is good enough or not is evaluated by looking at the training and test accuracy that you get. If the solution gives good values for those, then it’s fine. If not, then you need to try again either with different initialization or with different hyperparameters. I gave you the link to the famous Yann LeCun paper on your “saddle point” thread which explains why the problem of bad local optima usually is not that much of a problem.

paulinpaloalto · July 20, 2021, 12:46am

I only indirectly answered your last question:

The direct answer is the gradients will be zero at any local minimum or maximum or saddle point. So the fact that the gradients are zero doesn’t tell you whether the point you found is useful or not. The way you tell that is what I said above: you calculate the training and test accuracy with the model that is produced by that point at which the gradients are zero. The performance you get at that point is either good enough or it’s not. That’s how you tell a “bad local optimum” from a “good local optimum”.

Anbu · July 22, 2021, 8:15am

@paulinpaloalto Thanks sir

But for first doubt need bit of clairty, wandering around the minimum means, how the cost value would be in real time ?

And one more sir, how can i conclude that the algorithm converged to global minima ?

paulinpaloalto · July 22, 2021, 2:45pm

If gradient descent is not converging, then the cost will either diverge (grow larger) or oscillate as opposed to moving monotonically lower.

If you find a local minimum, there is not really any practical way to know if that is the global minimum unless it happens to actually be J = 0. All you can do is try again with different initial conditions and see if you find a lower cost. But if you look at the Yann LeCun paper that I linked on your “saddle point” thread, that makes the point that finding the global minimum is not really what you want in any case: that would represent very serious overfitting on the training set. The overall message of that paper is the reason that Prof Ng tells us in the lectures here in Course 2 that the concern over finding bad local optima is not really that much of a problem in real world sized networks.

Anbu · July 22, 2021, 5:20pm

Thanks Sir Below is the right pic for we can say wandering around the minimum right sir ?

paulinpaloalto · July 22, 2021, 5:37pm

Yes, that is a good illustration of what I would call “oscillation”.

Topic		Replies	Views
Cost function stuck at local minima Neural Networks and Deep Learning coursera-platform	8	1566	July 5, 2024
Local minimum vs Global minimum in the context of Gradient Descent Supervised ML: Regression and Classification week-module-1	5	803	December 29, 2022
Local optima in gradient descent Neural Networks and Deep Learning coursera-platform	2	651	March 13, 2022
Gradient Descent: minimum value Supervised ML: Regression and Classification week-module-1	2	433	August 20, 2023
Gradient Descent two local minima Supervised ML: Regression and Classification week-module-1	5	196	May 12, 2024

Wandering Around Minimum

Related topics