A lecture issue in dropout regularization implementation in week 1

Juan_Olano · December 4, 2022, 4:56pm

Lets try to understand a3 /= keep-prob together. I’ll use my own words to try to help you gain intuition on this:

We know that with Dropout we shut down some nodes in the layer. For example, in a 50-unit layer, with a Dropout of 80%, we shut down 10 units and keep 40. It is important to understand here that we don’t physically remove the shot down units, but instead we set them to zero.

Next, we know that once the Dropout is effected, we then calculate Z = W * a + b. In the case of a3, we would be calculating Z4 = W4 * a3 + b4, right?

Remember that we have shut off 10 units on a3, so their contribution is zero.

And here comes a key question:
What will happen to Z4 if we use a3 with just “80% of its power”? it will certainly reduce the expected value of Z4, right?

How can we solve this?
Well, if we divide a3 by keep-prob, meaning, a3 /= keep-prob, then that will “give more power” to the 80% of active units, right?

Think about this: what happens when you divide 1 by 0.8? 1/0.8 = 1.25 … it is bumped up!

So when we do a3 /= keep-prob, or in the example, a3 /= 0.8, we are basically bumping up all the active units to ‘compensate’ for the missing units.

I hope this explanation gives you some intuition on the reason why we apply ‘inverted dropout’.

Juan

Topic		Replies	Views
Week 1 -Possible Mistake on Lecture Video? Improving Deep Neural Networks: Hyperparameter tun week-1	4	32	March 4, 2025
Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun	22	1783	July 27, 2023
[C2W1] Dropout Regularization - Lecture issue Improving Deep Neural Networks: Hyperparameter tun	2	539	January 11, 2022
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun	3	671	May 24, 2022
Regularization Improving Deep Neural Networks: Hyperparameter tun	3	592	July 15, 2023

A lecture issue in dropout regularization implementation in week 1

Related topics