Lambda, w=o, regularization

Roza_Hakimova · December 21, 2021, 1:50pm

Lecture “why regularization reduces overfitting?” time 1.23
Why cranking lambda to be big, sets w=0?
Thank you.

paulinpaloalto · December 21, 2021, 7:07pm

It doesn’t set any of the W_{ij} values to zero. It just causes the optimization to push the values to be smaller. The regularization term is \lambda times the L2 norm of w, so if you want that product to be small and \lambda is large, that forces the norm of w to be small, right? This is not some deep or subtle point. What you are minimizing is the sum of the usual “log loss” cost plus the L2 regularization term.

Of course if you were only seeking the minimum value of \displaystyle \frac {\lambda}{2m}||w||^2, then there is an obvious solution: w = 0. But the point is that would (one hopes) not give a very good solution for your actual model, meaning that the “log loss” term would be large. So what you are seeking is a value of \lambda that gives a good “balance” between the log loss term and the regularization term. If you use some huge value of \lambda, you might very well end up with most elements of w = 0 and thus a bad solution. It wouldn’t overfit, but it wouldn’t be useful for much either.

Topic		Replies	Views
Why W will close to 0 when lambd? Improving Deep Neural Networks: Hyperparameter tun	6	513	August 19, 2022
Large value of lambda in Regularization Supervised ML: Regression and Classification week-3	14	1000	December 6, 2022
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun	4	562	December 21, 2021
About lamda divide M multiply WJ AI Discussions ai-discussions	2	71	February 2, 2024
Regulation: Why bigger Lambda leads to smaller W? Improving Deep Neural Networks: Hyperparameter tun	1	496	May 23, 2022

Lambda, w=o, regularization

Related topics