Why does regularization reduce w?

someone555777 · May 9, 2023, 2:31pm

Really can’t understand, how from this formula of regularization can we say, that increasing of lambda will reduce W?

alvaroramajo · May 9, 2023, 3:00pm

When increasing lambda, we are increasing the overall value of the cost function. That means that when the training tries to minimize the cost function, it will try the hardest to minimize the values of w because it has the biggest impact

saifkhanengr · May 9, 2023, 3:04pm

First, understand why we need the regularization term. When our model overfits, a good job at training data but a poor job at test, it means our model does not generalize well. So, we need to penalize it. Penalization means we need to increase the cost (J). So, when we add that extra regularization factor, it means we are trying to increase the cost. However, at the same time, we also try to decrease the cost by optimization (gradient descent).

So, gradient descent will try to reduce the cost while regularization term will try to increase the cost. That is a clash, right? The more the lambda value, the more the cost will be. But gradient descent is trying hard to minimize the cost, so it will reduce the value of parameters (W).

Best,
Saif.

someone555777 · May 11, 2023, 12:22pm

so, I still can’t understand why there is only reducing of paremeter W and not can be increasing too. As I understand cost function can be increased on both sides as when W is incrasing as when it is reducing

rmwkwok · May 11, 2023, 12:47pm

Hi @someone555777,

The gradient descent update formula is w := w - \alpha\times\frac{\lambda}{2m}\times(2w) - \cdots. Note that we are focusing on the regularization term, so I didn’t show the error term and replaced it with \cdots.

Ask yourself these questions, given that \alpha \lambda m are always positive:

if w is positive, will the regularization term make the updated w be larger in magnitude (sign not considered), or smaller?
if w is negative, will the regularization term make the updated w be larger in magnitude (sign not considered), or smaller?

If you find both to be smaller in terms of magnitude (sign not considered), then that explains the titled question. If you find any of it larger in terms of magnitude (sign not considered), tell me which one and explain it, and we can discuss your explanation.

Raymond

someone555777 · August 17, 2023, 3:21pm

so, do I understand correct, that it can reduce influence — abs of number, but sign — never?

paulinpaloalto · August 17, 2023, 3:27pm

I think that’s exactly backwards: the regularization term reduces the magnitude of the elements of w always, not never. That was Raymond’s point. Work out the examples that he suggests.

Well, if you get fully technical, you can probably construct an example in which the subtraction of the “update” term actually flips the sign of the corresponding element of w, but the point is that is not what happens with typical values of \alpha and \lambda. If you use large enough values of \alpha and \lambda to flip the sign, then you’ll likely get divergence in any case, so that’s not really a relevant example for the point that Raymond is making here.

rmwkwok · August 18, 2023, 3:42am

As Paul explained, and everything which is not forbidden is allowed.

If you have an interesting special case in your mind, feel free to share it.

Topic		Replies	Views
Question on how Lambda works Supervised ML: Regression and Classification week-module-3	9	507	February 22, 2023
Will Lambda reduce the size of the w parameters? Supervised ML: Regression and Classification week-module-3	7	515	May 6, 2023
Explanation of Lambda in Regularization of Linear Regression Cost Function Supervised ML: Regression and Classification week-module-3	2	167	July 21, 2024
Why W will close to 0 when lambd? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	527	August 19, 2022
Large value of lambda in Regularization Supervised ML: Regression and Classification week-module-3	14	1058	December 6, 2022

Why does regularization reduce w?

Related topics