To be more specific, at training time you’re multiplying the activations with a vector of independent Bernoulli random variables whose expected value is precisely keep_probs, so you divide by keep_probs to compensate for this.
Let me know if that helped.