Coursera - DLS - C2W1L04 - Do we need to divide L2 Regularization term by 'm'?

aks.edu · January 10, 2026, 8:02am

Diving by ‘m’ the Logistic regression cost function summation ensures that there is no change in its scale as number of data points increase. This means that the impact of more data point on scale of the Logistic regression cost function is already taken care of i.e. logistic regression cost function scale is not changing as we add more data point.

Also, there is ‘no impact’ on the L2 regularization term summation as we add more data (it is same irrespective of how many data points we have). Meaning, as we add more data points, scale issue in the first term is already handled and ‘there is no change in second term’ as we add more data point. Due to this reason, dividing first term by ‘m‘ is enough to handle the scale issue as we add more data points.

So, why are we dividing the L2 Regularization term by ‘m’?

Kic · January 10, 2026, 1:58pm

Hi @aks.edu

Here is a thread discussing the same topic.

paulinpaloalto · January 10, 2026, 8:35pm

Yes, this question has come up pretty frequently over time. Here’s another historical thread that makes the same point as in the link Kic gave, but in perhaps a bit more detail.

Topic		Replies	Views
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	95	April 10, 2025
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	527	August 30, 2022
Regularization question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	379	September 12, 2023
L2 regularization: lambda divided by 2m? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	736	June 17, 2021
Question About L2 Regularization Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	3	175	April 29, 2024

Coursera - DLS - C2W1L04 - Do we need to divide L2 Regularization term by 'm'?

Related topics