Inverted Dropout - Query

Utkarsh2707 · June 4, 2022, 7:09pm

Hey!

I had a query about the scaling up of the activation matrix.

So when you keep the keep_prob close to 1, say 0.9 or 0.8 and your probability to lose a node goes near 0 (0.1, 0.2) so if you think in terms of a very large layer (say 100 units), not a lot of weights are being dropped.

But when you scale up the activation matrix by dividing it by the keep_prob, it essentially increases it’s un-zeroed values. Which increases its norm (L2, and Frobenius) substantially.

So, my question is, even despite the norm substantially increasing, and the entire matrix’s values also substantially increasing, would it still practically reduce the cost function after every iteration? Would the weight matrix still be able to be effectively optimized despite this?

[If I've got some concepts wrong, or if you think I'll be able to understand this better after doing the assignment, please enlighten me about that]

paulinpaloalto · June 4, 2022, 7:27pm

Well, remember that it’s the same keep_prob value that you’re using for both the downscaling (by dropping nodes) and the upscaling by multiplying by 1/keep_prob, right? So if it’s close to 1, then so is its multiplicative inverse.

1/0.9 = 1.1111…

So you’re not “substantially increasing” the norm of the activation matrix. You’re increasing it in a manner that is commensurate with the amount you decreased it by doing the dropout. Of course the dropout is both statistical and quantized, so the compensation may not be exact on any given iteration, but the point is that it is commensurate and on a statistical basis will be as close as you can get. Everything here is playing out over hundreds or thousands of iterations, so it’s all statistical behavior in any case.

Here’s a thread that discusses this more and actually shows some examples of the effect of scaling on the 2-norm.

Here’s another interesting thread about dropout that discusses another subtle point: whether the dropout is the same across all samples in the batch.

Topic		Replies	Views
[C2W1 - Regularization] A question about inverted dropout scaling factor Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1085	January 27, 2024
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	830	November 19, 2024
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	752	June 16, 2021
Course 2 -- Week 1 -- Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	738	June 28, 2021
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	671	May 24, 2022

Inverted Dropout - Query

Related topics