After seeing the videos, dropout makes a lot of sense to me and I can get the basic intuition on why it works.

What I did not really understand is the division by the “keep probability” for the a. Andrew was a bit too fast for me here. Can anybody explain a bit more on that?

1 Like

Hi @ralfphonso ,

There have been some interesting discussions about it, I’m sure if you search you will find them. Here I send you a couple.

Hi Mentor,
We had couple of doubts. Can u please help to clarify?
What does this statement meaning { output at test time = Expected output at training time } we cannot understand the intuition behind this statement?
I don’t know why should we compensate Z4 which is divide by keep_prob, if we do like that, divide by keep_prob the involvement of dropout gets gone right for that layer ? I try to mean like we are applying drop out for that nodes in the layer then if we do divide by keep_prob …
To be more specific, at training time you’re multiplying the activations with a vector of independent Bernoulli random variables whose expected value is precisely keep_probs, so you divide by keep_probs to compensate for this.
Let me know if that helped.
3 Likes