Hello,
At the end of week1 assignment2, does anyone know why dropout outperforms L2 regularization on test data please, and if it’s always be the case? Thank you very much.
Hello,
At the end of week1 assignment2, does anyone know why dropout outperforms L2 regularization on test data please, and if it’s always be the case? Thank you very much.
Hey @X0450,
The fact that in this assignment Dropout Regularization beats the L2 Regularization, doesn’t mean that the Dropout Regularization is always better than L2 Regularization. Multiple arguments come to my mind that I can list here to validate this statement.
The first of them is, in this assignment, you have been given lambda = 0.7
for L2 and keep_prob = 0.86
for dropout. What if you weren’t given these values, and you had to find them using hyper-parameter tuning. In that scenario, it is highly likely that you will find a number of combinations of lambda
and keep_prob
values, for which L2 Regularization will outperform the dropout regularization.
Consider the above results for the second argument. Here, as can be seen L2 performs better than dropout on the train set, but worse on the test set, as compared to dropout regularization. This means that dropout is having more regularizing effect as compared to L2, since it brings down overfitting to a greater extent and improves test accuracy to a greater extent as well. What if, you increase the lambda
in L2 to have a stronger regularizing effect, perhaps it might lead to L2 outperforming dropout.
No one can say for sure which regularization technique will perform better for your dataset/application. Ultimately, you have to try these techniques and find out for yourself, since, on one dataset, L2 may perform better and on another dropout may perform better. This holds true for other regularization techniques as well. I hope this helps.
Regards,
Elemento