Have doubts in local minima in Gradient descent

paulinpaloalto · July 13, 2025, 6:45pm

That’s easy to say, but the question is how do you actually implement that? As Gent points out, we frequently have thousands or even millions of individual parameters, each of which is a real number. Meaning we’ve got a very large number of choices for each of those numbers (2^{64} choices if we’re using 64 bit floating point). So exactly how do you go about figuring out what the minimum possible value of J actually is in a case like that?

A lot of smart mathematicians have been thinking about this general problem for a long time and the best they’ve come up with so far is basically Gradient Descent. The general term for that type of algorithm is Conjugate Gradient Methods. We start by learning how to implement the simplest form of that here in DLS C1. Then we learn some more sophisticated techniques that can help with convergence in DLS C2.

There are also more levels of subtlety here: e.g. it’s not clear you actually want the global minimum for the cost, because that will represent very extreme “overfitting” on the training data. Here’s a thread which talks about these issues a bit more and gives references to some papers. If all that’s said there doesn’t make sense right now, please “hold that thought” and listen to what Professor Ng has to tell us in DLS C1 - C5.

Topic		Replies	Views
When there are multiple local minima? Supervised ML: Regression and Classification week-module-1	13	1004	May 27, 2023
Finding local minima of Cost Function Neural Networks and Deep Learning coursera-platform	2	545	May 25, 2021
Gradient Descent two local minima Supervised ML: Regression and Classification week-module-1	5	182	May 12, 2024
Local minimum vs Global minimum in the context of Gradient Descent Supervised ML: Regression and Classification week-module-1	5	784	December 29, 2022
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-module-2	2	849	December 3, 2022

Have doubts in local minima in Gradient descent

Related topics