Hi!
Why are we sure that lambda/m is less than 0? That means that the values should be selected specifically, so that this term lambda/m is less than 0??
Thank you in advance.
I think you must be misinterpreting something. Are you referring to something Prof Ng says in the lectures? Or is it some wording in the assignment? Please give us a reference (name of the lecture and time offset or the section of the assignment) where you see this statement.
Note that \lambda is a positive constant that is a hyperparameter that you choose which controls the behavior of L2 regularization. Larger values of \lambda produce a stronger effect of suppressing the learned weight values and thus more of a regularization effect.
One interesting question you may ask is why they bother dividing by the number of samples m, especially since the actual L2 regularization term is independent of the number of samples. I don’t know the definitive answer on that question, since not everyone specifies it that way. It just modifies the value that you would choose for \lambda of course. As the size of your training set increases, you’ll get less of a regularization effect with any given choice of \lambda. One way to decrease overfitting is to get more training data and in the limit of an infinite amount, you (in theory) would not need regularization. But this is just my guess, not something I have heard Prof Ng actually say.
Sure, here prof. Ng states:
Lecture “Regularization” time 8.43
Though I’d like to correct my question. (I am sorry) He’s saying (1-alpha*lambda/m) is less than 1. And still I’m confused, from where we know this.
Thank you!
I suggest you listen again more carefully. What he says is that (1 - \alpha \displaystyle \frac {\lambda}{m}) is slightly less than 1. That is because you are taking 1 and subtracting something positive from it. Of course it all depends on the particular values you have chosen for the hyperparameters \alpha (the learning rate) and \lambda (the regularization parameter) and what the value of m is, but in most cases \alpha will be a decimal less than 1 and m is typically a pretty large number.
Oh, sorry, I misprinted ((
thank you for clarification!!!