Inverted Dropout

I don’t understand the statement in your question 1. Can you give a reference to where Prof Ng says that? The offset into the video would be most useful.

For 2) and 3), the point that you are missing is that dropout zeros certain specific neurons on each iteration. The actual neurons that are “zapped” are different (randomly) on each sample on each iteration. Then we need to compensate for those particular missing neurons by slightly increasing the magnitude of all the other neurons that we did not “zap” in that particular iteration. Thus the total amount of “activation energy” stays (roughly) the same, but it comes from different neurons. The whole point of dropout is that it weakens the connections between particular output neurons and the input neurons at the next layer. But we don’t want an overall reduction in the amount of “energy” being output, as expressed (for example) by the 2-norm of A for the layer. In particular for point 3) Prof Ng is making an analogy to the concept of “expected value” in statistics. Even though we are zapping some neurons in the layer each iteration, we want the “expected value” of the activations viewed at the aggregate level to stay roughly constant.