Dropout regularization activation

ABHAY3 · January 2, 2022, 2:39pm

hello sir,
i am a student.
i need to know m i right about dropout, mainly the activation.
i watched Andrew’s explanation 3 4 times and read a post in discussions about my doubt.
so am i right when i say,
we scale up a3 by 0.8 as to get similar activation energy when we don’t dropout.

lets say for example A when we randomly drop some neurons , and we didn’t scale other neurons up, the activation came out to be 8.
but when we TEST the model on this example, we feed it in but as we use all the neurons in testing the activation value might go well beyond 8 ( lets say 13) because of extra neurons.
So, to solve this issue( training activation is far less (8) than testing activation (13) for same example) we are scaling other neurons UP.
8 and 13 are just used as a value for my doubt.

paulinpaloalto · January 2, 2022, 9:01pm

Note that any kind of regularization (dropout, L2 or any other) happens only at training time, not test time. The “reverse scaling” that we do when dropout is happening is as you say: it is to scale up the outputs that are not “zapped” by dropout so that the subsequent layers get roughly the same amount of “energy” from the dropout layer. Then at test time, we use all the trained neurons and neither dropout nor the reverse scaling happens.

If what I said above doesn’t answer your questions, here’s a thread from a while back with more detailed discussion on the scaling issues for dropout.

ABHAY3 · January 3, 2022, 3:18am

i have read this thread earlier then after i asked.
i understand that dropout or any other regularization is not used at testing.
my only question is am i right with my logic that
with the same NN architecture , if for an example the activation came out to be something
but with zapped neurons , the activation might change significantly FOR THE SAME EXAMPLE.
so to solve this issue we bump up the rest of the neurons.
my apologies if i wasn’t clear, i used testing and training terms just to depict that when testing we use the whole NN architecture and with dropout we use a smaller one.

Topic		Replies	Views
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun	7	752	June 16, 2021
Why Rescaling of Z_value on inverted dropout Improving Deep Neural Networks: Hyperparameter tun	5	782	August 16, 2022
Dropout regularization Improving Deep Neural Networks: Hyperparameter tun	1	674	May 10, 2021
Backward_Propagation_With_Dropout Improving Deep Neural Networks: Hyperparameter tun	4	643	May 25, 2021
Week 1, Understanding Dropout Improving Deep Neural Networks: Hyperparameter tun	1	534	July 18, 2022

Dropout regularization activation

Related topics