Question on how Lambda works

zheng_xiang1 · February 20, 2023, 10:10am

Hi can i just check how does raising lambda penalise the W parameter mathematically. I have been going through the videos and i get that raising lambda to an extremely hard value basically makes it such that only the b paramater will have an effect essential making 1 straight line horizontal to the x axis depending on its value. But i just cant understand how a large increase in lambda will reduce the W drastically.

rmwkwok · February 20, 2023, 10:28am

Hello @zheng_xiang1,

If we focus on just the regularization term, if w decreases, would it make the cost smaller or larger?

Raymond

zheng_xiang1 · February 20, 2023, 10:56am

Smaller if w was to be decreased

rmwkwok · February 20, 2023, 12:07pm

Exactly! To optimize the cost, we would want the weights to shrink. Now, what is the gradient of the regularization term?

Raymond

zheng_xiang1 · February 20, 2023, 5:57pm

the gradient should be lambda/m x w^2 i think.

rmwkwok · February 21, 2023, 12:54am

Let’s look at this slide which includes the gradient of the regularization term:

if we look at the bottom left, we have the weight’s update forumla, and on the bottom right, we have the gradient terms.

Previously, we already know that shrinking weights can reduce cost, and you also said in your first post that you want to see this mathematically, and here you go.

Again let’s just look at the regularization term, when w_j is positive,

from the bottom right, is the regularization’s gradient positive or negative?
in the bottom left, would it drive to increase w_j or decrease? You need the answer from Q1.

Raymond

zheng_xiang1 · February 21, 2023, 4:00am

1)-ve, i think, not too sure why
2)decrease if 1 is -ve
oh so is it becuz, the derivative of reg term is -ve and if it gets larger it reduces the derivative of the cost function?

rmwkwok · February 21, 2023, 4:29am

Let me re-ask my question.

What is the gradient of the regularization term?
If w_j is positive, is that term positive or negative? We need to be very careful about the signs.
in the bottom left, would it drive to increase w_j or decrease? We need to be very careful about the signs.

The answer to 1 is \frac{\lambda}{m}w_j, can you read this from the slide? We need to be careful when reading the slide. If you are not familiar with differentiation, I suggest you to refer to other course materials about gradient descent without regularization, and compare out the additional term due to regularization. It will take some time, but going through that exercise might be helpful.

Take your time.

rmwkwok · February 22, 2023, 12:26am

@zheng_xiang1

Here are my answers:

\frac{\lambda}{m}{w_j}
positive, because each symbol in the term is positive.
decrease, because each symbol in -\alpha\frac{\partial{J}}{\partial{w_j}} is positive, except for the minus sign.

What would be the answers if, instead, w_j is negative?

\frac{\lambda}{m}{w_j}
negative. Try to verify it.
increase. Try to verify it.

The conclusion is:
if w_j is positive, the regularization term tends to decrease w_j;
if w_j is negative, the regularization term tends to increase w_j.

Both tend to push w_j towards zero. Therefore, it shrinks the weights.
You asked how “Lambda” works in the title of this thread. The answer is, the larger the \lambda, the stronger the “pushing” force.

Cheers,
Raymond

zheng_xiang1 · February 22, 2023, 6:42am

THANKS A BUNCH, will revisit it soon !!

Topic		Replies	Views
Will Lambda reduce the size of the w parameters? Supervised ML: Regression and Classification week-module-3	7	497	May 6, 2023
Why does regularization reduce w? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	586	August 18, 2023
Explanation of Lambda in Regularization of Linear Regression Cost Function Supervised ML: Regression and Classification week-module-3	2	132	July 21, 2024
Large value of lambda in Regularization Supervised ML: Regression and Classification week-module-3	14	1008	December 6, 2022
Why is the value of regularization parameter(lambda) the same for all the weight parameters Supervised ML: Regression and Classification week-module-3	3	521	July 28, 2022

Question on how Lambda works

Related topics