Gradient Descent two local minima

In the video above, I didn’t understand if two local minima exists, how would the algorithm get to the global minimum?

There is no guarantee that gradient descent will find the global minimum.

That’s why it’s important to use convex cost functions whenever possible. They only have one minimum.

1 Like

Hi @sujirsachin there are two parts of this question, the first part is local minimum which is the smallest value for an input of a function, so you take an specific range of values and the minimum value of that range is the local minimum, so you can have several local minimum. The second part is the global minimum which is the smallest value in the domain of the function (across all possible range of values of the function the lowest value)

Let’s evaluate an example


In the above picture you have some range of values in the x axis representing inputs and the y-axis the function outputs


Range between 1 to 5 you get the minimum value -1 (local minimum)
Range between 5 to 10 you get the minimum value 2 (another local minimum)
Range between 10 to 15 you get the minimum value -1.3 (another local minimum)

In this case the global minimum is -1.3 assuming the domain of the function to be x=1 to 15. This is the lowest value across all the intervals

When you want to optimize you can get “stuck” at a local minimum and never reach the global minimum, since you need to evaluate a wide range of values for the entire function to make it work.

Let me know if this helps!


@pastorsoto Thank you so much for taking the time to write a detailed explanation.

One follow up question- So if we are “stuck” at the local minimum and never reach the global minimum, there’s no way we can reach the global minimum correct?

Yes, selecting the right parameters for optimizing your model could help to reach the global minimum, the thing is that you won’t know if you are at local or global minimum, that’s what @TMosh is suggesting when he pointed out that you need to use convex function

If you have to use a non-convex cost function (such as for training a neural network), then you can train multiple times using different initial weight values. Then compare all of the solutions and select the one with the lowest cost.

Sometimes you don’t need to find the global minimum - often any minimum that is “good enough” will do.

1 Like