Diving by ‘m’ the Logistic regression cost function summation ensures that there is no change in its scale as number of data points increase. This means that the impact of more data point on scale of the Logistic regression cost function is already taken care of i.e. logistic regression cost function scale is not changing as we add more data point.
Also, there is ‘no impact’ on the L2 regularization term summation as we add more data (it is same irrespective of how many data points we have). Meaning, as we add more data points, scale issue in the first term is already handled and ‘there is no change in second term’ as we add more data point. Due to this reason, dividing first term by ‘m‘ is enough to handle the scale issue as we add more data points.
So, why are we dividing the L2 Regularization term by ‘m’?