Inverted Dropout - Query

Well, remember that it’s the same keep_prob value that you’re using for both the downscaling (by dropping nodes) and the upscaling by multiplying by 1/keep_prob, right? So if it’s close to 1, then so is its multiplicative inverse.

1/0.9 = 1.1111…

So you’re not “substantially increasing” the norm of the activation matrix. You’re increasing it in a manner that is commensurate with the amount you decreased it by doing the dropout. Of course the dropout is both statistical and quantized, so the compensation may not be exact on any given iteration, but the point is that it is commensurate and on a statistical basis will be as close as you can get. Everything here is playing out over hundreds or thousands of iterations, so it’s all statistical behavior in any case.

