Doubt about the implementation of inverted dropout

To be more specific, at training time you’re multiplying the activations with a vector of independent Bernoulli random variables whose expected value is precisely keep_probs, so you divide by keep_probs to compensate for this.

Let me know if that helped.

2 Likes