Regularization, lambda/m

Roza_Hakimova · December 21, 2021, 8:46am

Hi!
Why are we sure that lambda/m is less than 0? That means that the values should be selected specifically, so that this term lambda/m is less than 0??
Thank you in advance.

paulinpaloalto · December 21, 2021, 10:28am

I think you must be misinterpreting something. Are you referring to something Prof Ng says in the lectures? Or is it some wording in the assignment? Please give us a reference (name of the lecture and time offset or the section of the assignment) where you see this statement.

Note that \lambda is a positive constant that is a hyperparameter that you choose which controls the behavior of L2 regularization. Larger values of \lambda produce a stronger effect of suppressing the learned weight values and thus more of a regularization effect.

One interesting question you may ask is why they bother dividing by the number of samples m, especially since the actual L2 regularization term is independent of the number of samples. I don’t know the definitive answer on that question, since not everyone specifies it that way. It just modifies the value that you would choose for \lambda of course. As the size of your training set increases, you’ll get less of a regularization effect with any given choice of \lambda. One way to decrease overfitting is to get more training data and in the limit of an infinite amount, you (in theory) would not need regularization. But this is just my guess, not something I have heard Prof Ng actually say.

Roza_Hakimova · December 21, 2021, 1:14pm

Sure, here prof. Ng states:
Lecture “Regularization” time 8.43
Though I’d like to correct my question. (I am sorry) He’s saying (1-alpha*lambda/m) is less than 1. And still I’m confused, from where we know this.
Thank you!

paulinpaloalto · December 21, 2021, 4:44pm

I suggest you listen again more carefully. What he says is that (1 - \alpha \displaystyle \frac {\lambda}{m}) is slightly less than 1. That is because you are taking 1 and subtracting something positive from it. Of course it all depends on the particular values you have chosen for the hyperparameters \alpha (the learning rate) and \lambda (the regularization parameter) and what the value of m is, but in most cases \alpha will be a decimal less than 1 and m is typically a pretty large number.

Roza_Hakimova · December 21, 2021, 6:08pm

Oh, sorry, I misprinted ((
thank you for clarification!!!

Topic		Replies	Views
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	470	July 17, 2023
Programming Assignment: Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	557	May 5, 2022
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	481	April 28, 2023
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	515	August 30, 2022
L2 regularization: lambda divided by 2m? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	706	June 17, 2021

Regularization, lambda/m

Related topics