In week 1, Regularization video, regularization term is divided by number of samples m
.
L = \dfrac{1}{m} \sum L + \dfrac{\lambda}{2m}\|W\|^2_F.
Why is it done?
If I add duplicate samples, cost function stays the same (m doubles) but regularization weight gets 2 times smaller…