Why Rescaling of Z_value on inverted dropout

paulinpaloalto · August 16, 2022, 3:01pm

I would state the intuition about how dropout works a bit differently than you do. The point is that the neurons that get dropped are different on each iteration, so the effect is to dampen overfitting by weakening specific connections between the outputs at one level and the inputs at the next level. Exactly how strong that weakening effect is depends on the keep_prob value that you use, of course. Maybe that subtlety in the intuition doesn’t really affect the bigger point you are making here, but I thought it was worth stating.

The problem is not that they can’t learn, the question is what they learn. If you don’t do the reverse scaling, then they learn potentially different things: they learn to react to weaker outputs, because that’s what they are trained on. But then the point is what they have learned may not fit as well with the actual data that they see when you run actual predictions without the dropout logic in place, because the outputs have more “energy”. Did you read far enough in that thread I linked to see part about the L2 norms of the outputs. Maybe that was earlier in the thread than the link I gave you.

Maybe you are thinking too hard here. It actually seems like a pretty straightforward argument: you want the training to be closer to what happens in prediction mode. You only want the stochastic weakening of the reaction to particular neuron outputs without a general decrease in the L2 norm of the inputs.

Topic		Replies	Views
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun	5	828	November 19, 2024
Course 2 -- Week 1 -- Dropout Improving Deep Neural Networks: Hyperparameter tun	1	737	June 28, 2021
Week 1 - Doubt in Dropout Regularization lecture video Improving Deep Neural Networks: Hyperparameter tun	7	752	June 16, 2021
Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun	22	1784	July 27, 2023
Inverted Dropout - Query Improving Deep Neural Networks: Hyperparameter tun	1	648	June 4, 2022

Why Rescaling of Z_value on inverted dropout

Related topics