Gradient descent Algorithm

David5804 · August 19, 2022, 2:23pm

Hi there,
If a cost function fw,b(x) has multiple local minima, by choosing different learning rate, we could reach different one right?
But how we could choose the learning rate that could reach the minimum value of the cost function?

TMosh · August 19, 2022, 2:53pm

No, the learning rate only controls the magnitude of the updates to the weights - it does not alter their direction.

One remedy for local minima is to start gradient descent from different initial weight values. If you do this many times, you can pick the solution that gives the lowest final cost.

However, in linear regression and logistic regression, this is not a concern. Both of those cost functions are convex, so there are no local minima.

David5804 · August 20, 2022, 4:22am

But like if there are multiple local minima, you have to choose different learning rate to go through all of them right?

TMosh · August 20, 2022, 5:37am

No, the learning rate has nothing to do with local minima.

David5804 · August 20, 2022, 10:12am

Oh, so it means that you just have to adjust the weight to get different local minima right?

shanup · August 20, 2022, 10:45am

Adjust the starting weights and then let the gradient descent do its work. As @TMosh mentioned you can try this with different initial values for the weight and pick the one that delivers the lowest final cost.

In the case of linear regression and logistic regression there is no risk of getting stuck at a local minima. The learning rate is more of the deciding factor - a lower learning rate means it will take longer to converge. A higher learning rate will get you to comvergence faster (provided the learning rate is not so high that it diverges)

Shubha_Chodagam · August 25, 2022, 3:48am

How do we for force different initial weights? Isn’t it randomly assigned everytime we train?

rmwkwok · August 25, 2022, 4:11am

My suggestion would be to run this before you setting up your neural network. This will set the seed for the random generator that is going to be used to actually initialize the weights. Setting it to the same seed value makes sure the weights are initialized to the same set of values. Setting it not to the same seed, then the weights will not be the same set.

Raymond

Topic		Replies	Views
C1_W1_Gradient-Descent Supervised ML: Regression and Classification week-module-1	3	581	July 28, 2022
Gradient Descent two local minima Supervised ML: Regression and Classification week-module-1	5	214	May 12, 2024
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-module-2	2	883	December 3, 2022
About local minimum in NN Advanced Learning Algorithms	6	110	July 7, 2024
Doubt regarding a potential limitation of gradient descent Supervised ML: Regression and Classification week-module-1	5	133	June 10, 2024

Gradient descent Algorithm

Related topics