Confusion about regularization formula

Feihong_YANG · June 24, 2023, 11:55am

Can anyone tell why the L2 regularization term take the size of data into consideration? It’s good to consider it while computing the loss function regarding the difference between prediction and true value since it calculate the summation of all the training instances, but seems the L2 term only relating to the sum of weights.

pastorsoto · June 24, 2023, 12:55pm

Hi @Feihong_YANG great question!

The L2 regularization term doesn’t directly take into consideration the size of the data set. Instead, its main purpose is to prevent the coefficients of the model from becoming too large and leading to overfitting. This is achieved by adding the L2 regularization term, which is the sum of the squares of all the feature weights, to the loss function. This term encourages the weights to be small.

However, there is an indirect relationship between the L2 regularization and the size of the data set. This lies in the fact that when you calculate the overall cost function (which is the sum of the loss function and the L2 regularization term), you generally take the average of the loss function over all instances in your data set, whereas the L2 regularization term is a sum of the weights.

As the size of your data set increases, the average loss decreases (assuming that your model is learning), but the L2 regularization term remains constant as it doesn’t depend on the number of instances. Therefore, the impact of the L2 regularization on the overall cost function becomes more pronounced with smaller data sets, and vice versa. This is why sometimes you might need to adjust the regularization strength (lambda) depending on the size of your data set.

So, while the L2 regularization term itself does not consider the size of the data set, its influence on the overall cost function is indeed influenced by the number of instances in your data set.

I hope this helps!

Feihong_YANG · June 24, 2023, 2:32pm

I see and I think it make sense. The larger dataset the less overfitting impact, while the regularization parameter still set as a constant value so we need to taking the size into consideration to reduce the impact of regularization term and to make it match with the loss function.

Cool thanks for the explanation! @pastorsoto

biswajit_mahalik · June 24, 2023, 7:02pm

Hi…can you please specify what is an instance?

rmwkwok · June 25, 2023, 12:39am

An instance means a sample.

Raymond

Topic		Replies	Views
C2_W1_regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	515	August 30, 2022
Why does the regularization term in L2 Regularization include division by the number of examples (m)? Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	2	31	April 10, 2025
Doubt on Regularization Supervised ML: Regression and Classification week-3	8	143	June 3, 2024
Questions on regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	470	July 17, 2023
Question About L2 Regularization Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	3	144	April 29, 2024

Confusion about regularization formula

Related topics