Have doubts in local minima in Gradient descent

Naven · July 13, 2025, 4:34am

Hi team, I have enrolled in the machine learning specialization in Coursera. I have a doubt regarding the gradient descent topic, it is mentioned that gradient descent finds the local minima and not the lowest value of J(W), instead of using gradient descent just looking at the lowest value of J(W) and getting its corresponding W value will give us the most optimal value of W with the lowest error right?

gent.spah · July 13, 2025, 10:44am

In a 2 dimensional model (using linear regression) its very easy to plot and find the minima (probably) but in neural networks with many dimensions (even thousands) visualization and analysis is unimaginable. And that is why you cannot really find the global minima unless you “get lucky” .

paulinpaloalto · July 13, 2025, 6:45pm

That’s easy to say, but the question is how do you actually implement that? As Gent points out, we frequently have thousands or even millions of individual parameters, each of which is a real number. Meaning we’ve got a very large number of choices for each of those numbers (2^{64} choices if we’re using 64 bit floating point). So exactly how do you go about figuring out what the minimum possible value of J actually is in a case like that?

A lot of smart mathematicians have been thinking about this general problem for a long time and the best they’ve come up with so far is basically Gradient Descent. The general term for that type of algorithm is Conjugate Gradient Methods. We start by learning how to implement the simplest form of that here in DLS C1. Then we learn some more sophisticated techniques that can help with convergence in DLS C2.

There are also more levels of subtlety here: e.g. it’s not clear you actually want the global minimum for the cost, because that will represent very extreme “overfitting” on the training data. Here’s a thread which talks about these issues a bit more and gives references to some papers. If all that’s said there doesn’t make sense right now, please “hold that thought” and listen to what Professor Ng has to tell us in DLS C1 - C5.

Naven · July 14, 2025, 4:28am

Hi Gent, Thanks for your explanation. I am relatively new to this field and i never though about models with multiple dimensions. My initial thought was we are plotting a fixed dimension chart against the cost function J (like J vs W, J vs W+B) so i felt a simple code like min(J) and then take the W and B value at min(J) could have done the task. But it makes sense now, Thank you!

Naven · July 14, 2025, 4:37am

Hi Paul, really appreciate you taking the time to answer my doubt. I understand it now, I overlooked the possibility that the cost function could have n dimensions and we cannot limit the possibility of the value of dimension within a fixed limit. Thank you!

Topic		Replies	Views
Gradient Descent two local minima Supervised ML: Regression and Classification week-module-1	5	199	May 12, 2024
When there are multiple local minima? Supervised ML: Regression and Classification week-module-1	13	1068	May 27, 2023
Local minimum vs Global minimum in the context of Gradient Descent Supervised ML: Regression and Classification week-module-1	5	811	December 29, 2022
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-module-2	2	872	December 3, 2022
Finding local minima of Cost Function Neural Networks and Deep Learning coursera-platform	2	550	May 25, 2021

Have doubts in local minima in Gradient descent

Related topics