Inverted Dropout

Yes, if we don’t do anything the individual values of the non-zapped neurons stays the same. But that’s not the point, right? There are fewer of them outputting non-zero values right? That’s what dropout is about. The point is about the aggregate amount of output from all the neurons in the layer (zapped and non-zapped) taken together. One way to assess that would be to take the 2-norm of the output activation matrix with and without dropout without doing the 1/keep_prob computation and watch what happens. Then try that same experiment again with the factor of 1/keep_prob. You don’t have to wonder about this stuff: you can actually try it and watch what happens.

Speaking of “watching what happens”, note that keep_prob is a probability, meaning a number between 0 and 1. Try dividing 42 by 0.8 and watch what happens.

1 Like