Dropout: Why divide by keep prob?

Hi, I am struggling to understand the reason behind doing the 4th step of dropout as follows.

Divide 𝐴[1] by keep_prob. By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)

I understand what the 4th step is but can u elaborate on the 2nd sentence above? How do we tie it back to the cost function?

Please see this recent thread for a pretty thorough discussion of this issue.

1 Like