Dropout regularization activation

Note that any kind of regularization (dropout, L2 or any other) happens only at training time, not test time. The “reverse scaling” that we do when dropout is happening is as you say: it is to scale up the outputs that are not “zapped” by dropout so that the subsequent layers get roughly the same amount of “energy” from the dropout layer. Then at test time, we use all the trained neurons and neither dropout nor the reverse scaling happens.

If what I said above doesn’t answer your questions, here’s a thread from a while back with more detailed discussion on the scaling issues for dropout.