Inverted Dropout

Sahil_RB · August 2, 2022, 11:27am

I understand that dividing by keep_prob will help in scaling up the value of the activations by roughly the same amount that they have been reduced due to the removal of some nodes.

But why do we need this value to stay roughly constant in the first place?

paulinpaloalto · August 2, 2022, 3:08pm

The reason is that all forms of regularization, including dropout, only happen during training. When we actually use the trained network to make predictions, there will be no dropout (which may be achieved without changing the code by simply setting keep_prob = 1.) So if you don’t scale up to compensate for the dropout, then the later layers will be trained in a way that won’t agree with what they actually are getting when we run normal prediction. They will be trained to expect lower aggregate “energy” from the previous layers, so things may not work as well as expected in normal “prediction” mode without dropout.

It’s been a while since I have watched Prof Ng’s lectures on this topic, so I don’t remember exactly what he says about this, but he must discuss this point. It might be worth watching again to see. (Update: yes, he mentions this between 8:30 and 9:00 in the main lecture on Dropout.)

Topic		Replies	Views
Regularization by Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	689	August 12, 2021
Week 1: dropout vs reducing network? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	14	1382	August 19, 2023
Inverted Dropout step Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	626	February 12, 2023
[C2W1 - Regularization] A question about inverted dropout scaling factor Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1089	January 27, 2024
Week 1, Dropout Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	656	June 8, 2022

Inverted Dropout

Related topics