Week 1, Understanding Dropout

Lina_Hourieh · July 18, 2022, 9:13am

In the video, Andrew mentioned: ‘’ dropout can formally be shown to be an adaptive form of L2 regularization, but the L2 penalty on different ways are different depending on the size of the activation is being multiplied into that way .‘’
Can you please illustrate that??

Elemento · July 18, 2022, 9:28am

Hey @Lina_Hourieh,
Welcome to the community. I am a little confused as to what you mean by

Just to be on the same page, L2 penalty simply adds the sum of the squares of the weights, scaled by the regularization parameter \lambda to the cost function and divided by the batch size 2 * m (where 2 is just a mathematical trick to make the differentiation easier), which eventually results in minimising the values of the weight parameters, and in-turn, reduces over-fitting.

Now, when Prof Andrew mentions the statement

He is hinting towards the fact that “in the presence of dropout”, the weights are distributed, and most of the neurons are assigned some importance (although it may be little, but still some), as opposed to the case of “absence of dropout”, in which, higher significance could be given to some neurons, i.e., the network might rely on some neurons more heavily by assigning larger weights to them. So, essentially, dropout is trying to break down larger weight values into smaller values.

For instance, consider the “absence of dropout”, let’s say we have a large weight assigned to a single neuron 9. Now, when we implement the dropout, it re-distributes this weight into say 2, 3, 4. Now, although 9 = 2 + 3 + 4, but 9^2 = 81 > 2^2 + 3^2 + 4^2 = 29. So, you see how dropout redistributed the weights to decrease the sum of the squares of the weights, which is the exact thing that L2 penalty does. Let me know if this helps.

Regards,
Elemento

Topic		Replies	Views
Dropout as a more Adaptive Form of L2 Regularization Improving Deep Neural Networks: Hyperparameter tun	5	657	May 15, 2022
Dropout technique makes me confused Improving Deep Neural Networks: Hyperparameter tun	7	721	May 12, 2022
Course 2. Regularization Improving Deep Neural Networks: Hyperparameter tun	1	509	April 23, 2022
Understanding Dropout Improving Deep Neural Networks: Hyperparameter tun	1	564	July 12, 2023
A doubt on dropout Improving Deep Neural Networks: Hyperparameter tun	4	518	August 17, 2023

Week 1, Understanding Dropout

Related topics