Hi there,
I’m wondering why this option is incorrect? Shouldn’t we invert the dropout at test time by dividing the keep-prob?
{moderator edit - quiz questions and answers removed}
Thanks in advance!
Hi there,
I’m wondering why this option is incorrect? Shouldn’t we invert the dropout at test time by dividing the keep-prob?
{moderator edit - quiz questions and answers removed}
Thanks in advance!
Sorry, but that’s incorrect. The point is that you never apply any of the regularization techniques at test time. You only apply them at training time and they affect how the model is trained. Once the model is trained, you just use it “as is” without any regularization. Since you aren’t dropping any neurons at test time, there is no need to compensate by the “inverted” multiplication. The whole point of the “inverted” part of inverted dropout is that, during the actual training with dropout activated, multiplying by 1/keep_prob
compensates for the dropped “activation energy” so that you don’t have to compensate for it at test time. Interestingly if you go back and actually read the original Hinton paper that introduced dropout, they hadn’t yet figured out the inverted part and they actually did multiply by 1/keep_prob
at test time. But later they realized that there was a cleaner way to accomplish the same thing. Here’s a thread which goes over a lot of this including the point about the Hinton paper if you read some of the later posts (e.g. starting with this one).
Of course note that you may choose to leave the dropout code in your forward and backward propagation logic including the factor of 1/keep_prob
, but when you are not training or even when training and you don’t want to use dropout, you just set keep_prob = 1
to disable that code.