Clarification on Cost Discrepancy Between L2 Regularization and Dropout

paulinpaloalto · March 30, 2025, 6:15pm

Actually, now that I think \epsilon harder about what David is saying there, there can be an arbitrary part of the relationship between L and J in some cases, e.g. the L2 term when we are doing L2 regularization. That is added to the mean of the L values to get the final J value that is used for computing gradients in that case. There are other forms of regularization that add different additional terms, e.g. L1 or “Lasso” regularization which adds a term based on the sums of the absolute values of the weights. But the “base” unregularized cost J is just the mean of L over the samples.

One other side question that is worth mentioning is that people are frequently curious about why the L2 regularization term is scaled by \frac {1}{m}. That makes it look a bit like an average, but the sum is not over the samples of course. I don’t know the answer and Prof Ng does not discuss this in the DLS C2 lectures (at least that I can recall), but one theory would be that the purpose is to make the value of the hyperparameter \lambda orthogonal to the dataset size. Here’s a thread which discusses that a bit more. And actually here’s a thread in which @conscell points out that Prof Ng does say more about this in the MLS lectures and does confirm the “hyperparameter orthogonality” motivation for doing the scaling that way.

Topic		Replies	Views
Difference between cost function of L2 and dropout regulariztion - Week1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	566	December 19, 2022
Dropout as a more Adaptive Form of L2 Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	668	May 15, 2022
Regularization Intuition In Programming Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	537	July 13, 2021
Cost function vs. accuracy Neural Networks and Deep Learning coursera-platform	1	565	July 2, 2022
Week1 assignment2 ( L2 regularization ) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	582	May 15, 2022

Clarification on Cost Discrepancy Between L2 Regularization and Dropout

Related topics