About lamda divide M multiply WJ

paulinpaloalto · February 2, 2024, 5:45pm

If the question is why they bother scaling the L2 regularization term by \frac {\lambda}{m} versus just using plain \lambda, I think you could have done it either way. It’s just a constant after all. I have never actually seen an “official” explanation of this, but my theory is that the reason is that you want to make the selection of the \lambda hyperparameter orthogonal to the size of the training set. Note that when you do the training, you may have several different training sets, e.g. a smaller subset that you use early on to speed up the experimentation process when you’re playing with hyperparameters and then the full training set once you feel like you’re getting pretty close with the hyperparameter choices. It would be awkward if you had to separately tune the \lambda differently with the two differently sized datasets. The other intuition here is that the purpose of L2 regularization is to eliminate or mitigate overfitting. The other strategy for minimizing overfitting is to get more training data. So with the factor of \frac{1}{m}, in the limit as m \rightarrow \infty, the L2 term goes to zero. If you have the ability to add more data (which is not always practical), then you wouldn’t also have to do further fiddling with the \lambda value, in theory at least.

But don’t forget my disclaimer from earlier: this is just my theory and I don’t have any external evidence to support it.

Topic		Replies	Views
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	95	April 10, 2025
Explanation of Lambda in Regularization of Linear Regression Cost Function Supervised ML: Regression and Classification week-module-3	2	289	July 21, 2024
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	599	December 21, 2021
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	527	August 30, 2022
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	516	July 17, 2023

About lamda divide M multiply WJ

Related topics