Regularization question

iterentyev · September 12, 2023, 10:21pm

In week 1, Regularization video, regularization term is divided by number of samples m.

L = \dfrac{1}{m} \sum L + \dfrac{\lambda}{2m}\|W\|^2_F.

Why is it done?
If I add duplicate samples, cost function stays the same (m doubles) but regularization weight gets 2 times smaller…

TMosh · September 12, 2023, 10:40pm

Dividing by ‘m’ reduces the amount of regularization for large data sets.
This is a common practice which seems to work well - there is no mathematical basis for it.

Topic		Replies	Views
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-1	2	23	April 10, 2025
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun	4	481	April 28, 2023
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun	2	515	August 30, 2022
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun	2	469	July 17, 2023
L2 regularization: lambda divided by 2m? Improving Deep Neural Networks: Hyperparameter tun	1	699	June 17, 2021

Regularization question

Related topics