Inverted Dropout - Query

paulinpaloalto · June 4, 2022, 7:27pm

Well, remember that it’s the same keep_prob value that you’re using for both the downscaling (by dropping nodes) and the upscaling by multiplying by 1/keep_prob, right? So if it’s close to 1, then so is its multiplicative inverse.

1/0.9 = 1.1111…

So you’re not “substantially increasing” the norm of the activation matrix. You’re increasing it in a manner that is commensurate with the amount you decreased it by doing the dropout. Of course the dropout is both statistical and quantized, so the compensation may not be exact on any given iteration, but the point is that it is commensurate and on a statistical basis will be as close as you can get. Everything here is playing out over hundreds or thousands of iterations, so it’s all statistical behavior in any case.

Here’s a thread that discusses this more and actually shows some examples of the effect of scaling on the 2-norm.

Here’s another interesting thread about dropout that discusses another subtle point: whether the dropout is the same across all samples in the batch.

Topic		Replies	Views
[C2W1 - Regularization] A question about inverted dropout scaling factor Improving Deep Neural Networks: Hyperparameter tun	3	1051	January 27, 2024
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun	5	823	November 19, 2024
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun	7	752	June 16, 2021
Course 2 -- Week 1 -- Dropout Improving Deep Neural Networks: Hyperparameter tun	1	734	June 28, 2021
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun	3	667	May 24, 2022

Inverted Dropout - Query

Related topics