Inverted Dropout

paulinpaloalto · April 1, 2022, 10:58pm

Ok, I think I understand what is going on now:

The method in the paper is the very original method. It was the first paper about dropout. The method that Prof Ng shows is just a different and arguably cleaner way to achieve the same result. The reason that in the original they needed to downscale the weights by multiplying by keep_prob at test time (and every other time they use them) is that they did not use the method of upscaling the activations by 1/keep_prob at training time. The method Prof Ng shows is just another way to achieve the same result: if you multiply the activations by 1/keep_prob during training, it upscales the outputs, which in turn causes the learned weights (coefficients) to be downscaled at training time. So that means you don’t need to downscale them later when you use them. Once you’re done with training, the weights are the weights and you don’t need to worry about what the keep_prob value was or even that the training involved dropout. It’s just a simpler method of achieving the same result. I bet if Prof Hinton had thought of that formulation at the time, they would have written it that way in the paper.

Topic		Replies	Views
Regularization by Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun	1	686	August 12, 2021
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun	3	667	May 24, 2022
A lecture issue in dropout regularization implementation in week 1 Improving Deep Neural Networks: Hyperparameter tun	7	710	December 9, 2022
Inverted Dropout step Improving Deep Neural Networks: Hyperparameter tun	2	620	February 12, 2023
Why do you divide the activations by keep_prob when you use drop Improving Deep Neural Networks: Hyperparameter tun	7	666	May 22, 2023

Inverted Dropout

Related topics