Clarification on Cost Discrepancy Between L2 Regularization and Dropout

dtonhofer · March 25, 2025, 10:52am

Isn’t it simply because in the case of L2 regularization, COST includes the values of the Frobenius norms of all the weight matrices, i.e. the value of COST everywhere in the space of weights is higher by a large-ish sum of all the w^{2}. One is adding a high-dimensional parabola around the “all weights 0” point.

This is not the case with the NN that is used with dropout, so COST is correspondingly lower.

Note that this doesn’t really matter as COST is just an arbitrary value that tells us how good we are currently doing (relative to other solutions), which is why its formula can be chosen rather freely.

Topic		Replies	Views
Difference between cost function of L2 and dropout regulariztion - Week1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	569	December 19, 2022
Dropout as a more Adaptive Form of L2 Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	681	May 15, 2022
Regularization Intuition In Programming Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	546	July 13, 2021
Cost function vs. accuracy Neural Networks and Deep Learning coursera-platform	1	572	July 2, 2022
Week1 assignment2 ( L2 regularization ) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	585	May 15, 2022

Clarification on Cost Discrepancy Between L2 Regularization and Dropout

Related topics