Coursera - DLS - C2W1L04 - Do we need to divide L2 Regularization term by 'm'?

Diving by ‘m’ the Logistic regression cost function summation ensures that there is no change in its scale as number of data points increase. This means that the impact of more data point on scale of the Logistic regression cost function is already taken care of i.e. logistic regression cost function scale is not changing as we add more data point.

Also, there is ‘no impact’ on the L2 regularization term summation as we add more data (it is same irrespective of how many data points we have). Meaning, as we add more data points, scale issue in the first term is already handled and ‘there is no change in second term’ as we add more data point. Due to this reason, dividing first term by ‘m‘ is enough to handle the scale issue as we add more data points.

So, why are we dividing the L2 Regularization term by ‘m’?

Hi @aks.edu

Here is a thread discussing the same topic.

1 Like

Yes, this question has come up pretty frequently over time. Here’s another historical thread that makes the same point as in the link Kic gave, but in perhaps a bit more detail.