C2_W1_regularization

abdou_brk · August 30, 2022, 8:06pm

hello everyone
there is a point that i don’t understand . why in the regularization formula we divide the term by a factor of m which is the #of training examples since we sum the weights associated to our features so our sum term is from 1 to nx as shown on the image below so i think it’s more logical to divide by nx instead.

paulinpaloalto · August 30, 2022, 8:24pm

It is an interesting question. I do not know the answer, but perhaps we will get lucky and someone who knows more will chime in. This has come up a number of times in the past. One high level point is that not everyone formulates L2 Regularization in that way. E.g. here’s a lecture from Prof Geoff Hinton which covers L2 and you’ll see that he uses the factor \frac {\lambda}{2} times the sum of the squares of all the weights.

So apparently this is a choice that Prof Ng has made and there are other ways to make that choice. One idea that I can think of that might motivate scaling the factor by \frac {1}{m} is that perhaps it makes the choice of \lambda a bit easier in that you can pick one value and it will still work if you change the size of your dataset. With Prof Ng’s formulation the effect of L2 regularization will be decreased the larger your training dataset is. And of course we know that one of the primary ways to address the problem of overfitting is to increase the size of your training set. In the limit as m \rightarrow \infty then the need for regularization goes to zero. Just a thought, which maybe gives some intuition. As I mentioned above, I say this with the disclaimer that I don’t really know the definitive answer.

abdou_brk · August 30, 2022, 9:31pm

I like the intuition you made about this , i think that makes sense to me now

Topic		Replies	Views
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	481	April 28, 2023
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	470	July 17, 2023
Question About L2 Regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	3	144	April 29, 2024
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	31	April 10, 2025
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	563	December 21, 2021

C2_W1_regularization

Related topics