Why W will close to 0 when lambd?

sunson29 · August 19, 2022, 5:18pm

in the video: " Why Regularization Reduces Overfitting?", the teacher mentioned many times, if the lambd is big, the w will get close to 0, can someone explain to me why in terms of the new cost function J ? I want to understand this from the equation.

my understanding sofar, If the lambd is big, the new regularization team will be big, then J is big. Since we want J is small, we want the new reg term to be small as well. If I give lambd big, so the w will go small?? I am not sure if I truly understand it or not…

balaji.ambresh · August 19, 2022, 5:36pm

Regualarization term will come into effect during the backward pass when updating the weights using gradients. The additional regularization term will push the weights to a smaller value.
Please check the 2nd assignment for the week. You’ll implement regularization from scratch.

sunson29 · August 19, 2022, 6:19pm

thanks for the reply. but that’s not what I am asking. I am asking why, I wish to understand from the side of math.

paulinpaloalto · August 19, 2022, 6:22pm

Look at what the L2 regularization term is: it’s the sum of the squares of all the individual elements of all the W matrices times a constant based on \lambda. So how do you minimize that? By making the absolute values of all those elements as small as possible, right? Well there is an absolute minimum there: set them all to zero. But then the model becomes trivial, meaning that it ignores the input data and always makes the same prediction on any input. So the important question is what value to choose for \lambda: if you make it large, then the regularization term dominates J and all it does is push the W values to zero. Of course the point is that J is now the sum of two terms and you want a balance between the “real” cost based on cross entropy loss (which actually measures the accuracy of the predictions of the model) and the regularization term. You want to suppress the weights somewhat to reduce overfitting, but not too much or the model just becomes 0.

sunson29 · August 19, 2022, 7:04pm

I think I am followed. So, I wanna double confirm:

=Since we want J is small, we want the new reg term to be small as well. If I give lambd big, so the w will go small because of the sum of squares. Am I correct?

paulinpaloalto · August 19, 2022, 8:03pm

Yes, that’s correct. The point is that with L2 regularization, there are now two terms in the cost J:

The normal cross entropy loss, which measures the actual accuracy of the predictions of the model.
The L2 regularization term.

If you make the \lambda value too large, then the L2 term is going to dominate the cost and just drive the W values close to 0. In the limit, they all become 0, which, as I pointed out in my previous response, makes the model useless.

So you need both terms in J to play their intended role, which requires that you tune the value of \lambda appropriately. Prof Ng spends lots of time in the lectures in Week 1 and Week 2 discussing how to tune hyperparameters in a systematic way.

sunson29 · August 19, 2022, 8:14pm

thank you thank you Paul! you always helped!

Topic		Replies	Views
Lambda, w=o, regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	510	December 21, 2021
Why does regularization reduce w? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	586	August 18, 2023
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	469	July 17, 2023
About lamda divide M multiply WJ AI Discussions ai-discussions	2	71	February 2, 2024
Explanation of Lambda in Regularization of Linear Regression Cost Function Supervised ML: Regression and Classification week-3	2	127	July 21, 2024

Why W will close to 0 when lambd?

Related topics