Hi, can anyone explain the causal relationship between regularization and keep_prob? Not quite sure how mathematically reducing the keep_prob increases regularization effects.
The lower the value of the “keep probability” means that more neurons are getting randomly zapped (replaced with zero values) on each iteration. The stochastic behavior of weakening the dependence of the neurons in a given layer on the outputs of the previous layer (the one to which dropout was applied) by randomly zeroing some of those outputs is the mechanism through which dropout achieves its regularization effect. The more you zero, the stronger the effect. Of course there can also be such a thing as too much regularization.
Prof Ng explained all this in the lectures in some detail. If it didn’t all make sense the first time through, it might be a good idea just to watch them again with what I said above in mind.