Week 1-Regularization-dropout-scale?

hyc · August 16, 2022, 2:03pm

I’m quite confusing about the operation: D1 /= keep_prob in the dropout regularization.
In my test, the accuracy with this operation is close to the one without that, which are 92% in train set, 95% in test set, and 93% in train set, 95% in test set respectively.
Is it really necessary?

paulinpaloalto · August 16, 2022, 7:41pm

Just doing one particular test case doesn’t constitute a general proof of anything, right? Perhaps there are cases in which it doesn’t make that much difference. Just out of curiosity what was the keep_prob value in your experiment? One assumes that Prof Hinton and his group did more extensive experiments before they published the original paper on dropout. One interesting thing to note is that they accomplished the reverse scaling in a different (equivalent but a lot more inconvenient) way in the original paper. Here’s a thread which discusses that point. Please read from the linked post forward on the thread. There is also a link to the Hinton paper included there.

Here’s another recent thread on the point about why the reverse scaling is useful.

hyc · August 22, 2022, 10:38am

Thanks for answering my question, it helps me a lot.

Topic		Replies	Views
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	830	November 19, 2024
[C2W1 - Regularization] A question about inverted dropout scaling factor Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1085	January 27, 2024
Dropout scaling fix (division by keep_prob) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	680	September 28, 2022
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	752	June 16, 2021
Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	592	July 15, 2023

Week 1-Regularization-dropout-scale?

Related topics