Learning rate on Regularization

David_Long · December 20, 2023, 8:44am

Hi, looks like regularization is trying to keep w small, to achieve that we add an extra part to cost function as below

So my question is can we achieve the same by increasing the learning rate during gradient descent instead of changing cost function.

This course is brilliant, thanks for your effort!

rmwkwok · December 20, 2023, 10:24am

Hello @David_Long,

Can you elaborate more on how increasing the learning rate can keep w small?

Cheers,
Raymond

David_Long · December 21, 2023, 2:46am

Hi @rmwkwok , I think that is just a hypothesis, I haven’t verify it and it may not be correct. And increase the learning rate may cause over shooting and never converge, but from what I can see from the formula the w would decrease or increase more each step by increasing learning rate. So here I am trying to understand why the extra part added to the cost function would help keep w small, that might be an explanation from mathematic, but very much appreciated it if you could help me understand it in anyway.

rmwkwok · December 21, 2023, 2:55am

Hello @David_Long,

Hypothesis is fine, but you can show us how you came up with it, right? Because no one shall prove your hypothesis for you.

Now, I assume the below is your “how” and it is what we can discuss:

The fact is, it only shows that a larger learning rate will minus w by a larger value, but it is not equal to “shrinking” w.

For example, if w = 0.1, you can minus it by 1,000 because the learning rate is large, resulting in w=-999.9.

Is -999.9 small? It is not.

For it to be qualified as small, |w| is small. In other words, -999 is as “large” as 999 in terms of their magnitudes.

Therefore, a large learning rate does not necessarily shrink w.

Cheers,
Raymond

David_Long · December 21, 2023, 3:14am

That explains it very much, thanks for your quick response!

rmwkwok · December 21, 2023, 9:54pm

You are welcome, @David_Long.

Cheers,
Raymond

Topic		Replies	Views
Learning Rate - C1_W2_Lab03 Supervised ML: Regression and Classification week-2	6	536	April 26, 2023
Why does regularization reduce w? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	586	August 18, 2023
MLS C1 W1 About the Learning rate course Supervised ML: Regression and Classification week-1	4	578	July 15, 2022
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning coursera-platform	3	653	July 28, 2023
Does larger (not too large) learning rates always converge faster? Supervised ML: Regression and Classification week-2	2	489	February 2, 2023

Learning rate on Regularization

Related topics