Expected Value Stays the Same After Scaling In Dropout

Tejas97 · October 3, 2021, 2:16pm

I have been trying to wrap my head around this for a few days and can’t seem to grasp it thoroughly.
According to the lecture notes for C2W1 , the activations that remain after the dropout mask has been applied have to be scaled by the keep_prob factor i.e. if keep_prob is 0.5 and I had 4 units left, I will double the values for all of them.

This is also mentioned in the programming excercise for Regularization:

During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.

According to wikipedia, the expected value of a random variable is intuitively it’s arithmetic mean…
Going with that definition the expected value of 6 numbers will be:
E = (a1 + a2 + a3 + a4 + a5 + a6) / 6

If I ‘drop out’ 50% of them, and scale the remaining numbers, I end up with:
(a1=a2=a3=0 after Dropout)

E' = (2a4 + 2a5 + 2a6) / 6
E' = (a4 + a5 + a6) / 3

How is E’ equal to E? What am I misunderstanding here?

kampamocha · October 3, 2021, 10:32pm

What you need to understand is that the actual value of the expression (a1 + a2 + a3 + a4 + a5 + a6) / 6 E is not necessarily equal to the actual value of the expression (a4 + a5 + a6) / 3, but their expected values are the same assuming they come from the same distribution.

Tejas97 · October 4, 2021, 5:07am

Right. But how do we know that they come from the same distribution?

Topic		Replies	Views
In dropout why we want to maintain expected value of a[l] Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	731	April 12, 2023
Why do you divide the activations by keep_prob when you use drop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	730	May 22, 2023
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	671	May 24, 2022
Why do we need to scale A after dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	565	July 26, 2021
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	752	June 16, 2021

Expected Value Stays the Same After Scaling In Dropout

Related topics