Doubt regarding a potential limitation of gradient descent

AmeyaB · June 9, 2024, 4:26pm

In the learning rate video, an example of a non linear function is given, regarding if we take a point which is already in a local minima. However, I was wonderong, if we take a point in a local maxima for w, will the gradient descent still work? as the slope at that point will be 0, and w will remain the same, so it won’t go towards the local minima. What is the solution to this?

paulinpaloalto · June 9, 2024, 6:28pm

If you are already at a local minimum, then you’re done, right? The question is just how you realize that. It’s always a decision that you have to make when to stop the iterations. One common way is to monitor whether the cost value is continuing to decrease or not. When it stabilizes and stops decreasing, then there is no point in continuing, which is what would happen if you exactly land on a point where the gradients are zero.

Of course it can also happen that you slightly overshoot and then the cost can oscillate around a low value. But if you are using too high a learning rate value, the oscillations can get larger instead of staying close to the minimum value. So you have work to do here to pick a good learning rate value and a good number of iterations.

If you continue through MLS and eventually take DLS, you will later learn about more sophisticated algorithms for doing gradient descent that use dynamic learning rates.

AmeyaB · June 9, 2024, 6:41pm

Thanks for your answer! I guess you misinterpretted my question. It was, that if we take our inital point as the local maxima, then we won’t be able to reduce the cost function, and reach the local minima, as the slope is 0 at the local maxima. Then how should we tackle this problem.

paulinpaloalto · June 9, 2024, 6:42pm

Choose a different initial point. Start over with random initialization with a different seed value or with no seed value at all. There is never any guarantee that gradient descent is just going to work with no effort on your part. You need to analyze the results and take action when it doesn’t work.

Also note that just as a general matter the probability of starting with a random initial point that just happens to be a local maximum or local minimum or for that matter a saddle point (another point with zero gradients) is pretty low.

TMosh · June 9, 2024, 7:56pm

At this point in the course, we’re only using simple convex cost functions. They’re shaped like parabolas with positive 2nd derivatives.

So they have only one minimum, and no local maxima.

AmeyaB · June 10, 2024, 1:38am

Got it, thanks!

Topic		Replies	Views
Gradient descent fails at local maximum initial values? Supervised ML: Regression and Classification week-1	2	554	June 26, 2022
Gradient descent local maximum Supervised ML: Regression and Classification week-1	2	506	July 7, 2022
Gradient descent Algorithm Supervised ML: Regression and Classification week-1	7	619	August 25, 2022
C1_W1_Gradient-Descent Supervised ML: Regression and Classification week-1	3	571	July 28, 2022
What if we get local maxima when we choose w, b in gradient descent algorithm Supervised ML: Regression and Classification week-1	4	528	January 20, 2025

Doubt regarding a potential limitation of gradient descent

Related topics