Inverted dropout Intuition?

The point is that (as Nobu showed) we are eliminating a certain percentage of the outputs of the layer on each iteration. But once the network is fully trained and we are using it in “prediction” mode, we will not be doing dropout at all. So we want the next and subsequent layers to be trained on the standard amount “activation energy” that the layer is producing. That is why we compensate for the dropped neurons in that way.

This point has been discussed before. Here’s a pretty long thread that goes over these points and even illustrates the point about the norm of the output activations with actual examples (but you need to read all the way through, not just the first couple of posts).

1 Like