Week 1-Regularization-dropout-scale?

Just doing one particular test case doesn’t constitute a general proof of anything, right? Perhaps there are cases in which it doesn’t make that much difference. Just out of curiosity what was the keep_prob value in your experiment? One assumes that Prof Hinton and his group did more extensive experiments before they published the original paper on dropout. One interesting thing to note is that they accomplished the reverse scaling in a different (equivalent but a lot more inconvenient) way in the original paper. Here’s a thread which discusses that point. Please read from the linked post forward on the thread. There is also a link to the Hinton paper included there.

Here’s another recent thread on the point about why the reverse scaling is useful.