Difference between cost function of L2 and dropout regulariztion - Week1

S.hejazinezhad · December 19, 2022, 9:42am

Hi everybody,
I am reviewing the DLS2 these days and try to fill the voids in my previous analysis. I am wondering what is the difference between cost function between L2 and Dropout methods. In summary, we added a 2nd term to the cost function in L2 case both in forward and backforward path. This clearly results in reducing the weights and prevent the model from overfitting. To have a monotonical observation in cost function vs the number of iterations, we must add that 2nd term.
However, the story is a bit different and it is a bit vague to me. We do dropout to learn the model during training to not depend on a specific features, having a robust model. To plot the cost function with regard to the number of iterations and observing the decrease over iteration, we set the dropout to 1 (no regularization) . I would be thankful if this is explained how is model able to show the decrease without regularization effect at this time? we practically removed the dropout effect and it will be then turned on as it was said in the lecture.
Thanks in advance.
Kind regards,

paulinpaloalto · December 19, 2022, 9:57am

In the case of any form of regularization, including both L2 and dropout, you only include the regularization during training. You do not include it during inference mode (just making predictions with the model). In the case that you are using the cost to assess whether the convergence is working, it’s a bit more complicated. In the case of L2, since the L2 term is simply an addition to the real cost, the changes in value of the regularized cost are a bit clearer. In the case of dropout, Prof Ng’s point in the lectures is that with dropout enabled, it’s literally a different network on each training iteration, so comparing the output of the cost in that case is not “apples to apples”. That is why you need to use the cost in “inference mode” with the dropout disabled when you are using the cost to assess whether convergence is working or not. Of course you could view this as a purely mathematical point and the question is whether it really matters that much from a practical standpoint. I have not tried any experiments with dropout to see how much of an effect there actually is on the behavior of the cost with dropout. Of course the “keep probability” has to be a factor in that: if you’re using a relatively milder value (e.g. 0.9), it will matter less. If you ever run into this in “real life”, you can try some experiments and let us know what results you see comparing the two methods of judging convergence.

paulinpaloalto · December 19, 2022, 10:13am

Maybe the simpler way to state the answer is that the actual model that is the result of training at any point in the training process (any number of iterations) is the model with dropout disabled, right? That is the actual model in question. In other words the model that we will actually use for making predictions.

One other higher level point to make is that looking at the cost is actually a fairly crude metric for the behavior of convergence. Prediction accuracy is actually the gold standard. Note that because accuracy is quantized, a lower cost does not necessarily mean higher accuracy.

S.hejazinezhad · December 19, 2022, 11:07am

Ok, then. perfect. Regarding your 1st feedback, I agree with you. The L2 case is more explainable and obvious than dropout case when speaking of cost concerning the iteration axis, at least. Anyway, it is confirmed that the effect of the robustness channel remains after iterations even when dropout is disabled!
Thanks

Topic		Replies	Views
Clarification on Cost Discrepancy Between L2 Regularization and Dropout Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	8	71	March 30, 2025
Dropout as a more Adaptive Form of L2 Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	681	May 15, 2022
Monotonic decrease of cost function plot confusion Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	4	169	May 12, 2024
Cost function is not well defined using dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	549	November 17, 2022
Regularization Intuition In Programming Assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	546	July 13, 2021

Difference between cost function of L2 and dropout regulariztion - Week1

Related topics