Dropout regularization activation

paulinpaloalto · January 2, 2022, 9:01pm

Note that any kind of regularization (dropout, L2 or any other) happens only at training time, not test time. The “reverse scaling” that we do when dropout is happening is as you say: it is to scale up the outputs that are not “zapped” by dropout so that the subsequent layers get roughly the same amount of “energy” from the dropout layer. Then at test time, we use all the trained neurons and neither dropout nor the reverse scaling happens.

If what I said above doesn’t answer your questions, here’s a thread from a while back with more detailed discussion on the scaling issues for dropout.

Topic		Replies	Views
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun	7	752	June 16, 2021
Why Rescaling of Z_value on inverted dropout Improving Deep Neural Networks: Hyperparameter tun	5	782	August 16, 2022
Dropout regularization Improving Deep Neural Networks: Hyperparameter tun	1	674	May 10, 2021
Backward_Propagation_With_Dropout Improving Deep Neural Networks: Hyperparameter tun	4	643	May 25, 2021
Week 1, Understanding Dropout Improving Deep Neural Networks: Hyperparameter tun	1	534	July 18, 2022

Dropout regularization activation

Related topics