In dropout why we want to maintain expected value of a[l]

paulinpaloalto · April 12, 2023, 4:00am

The key point to realize is that all forms of regularization (dropout, L2, Lasso …) only happen at training time, not at test time or in normal inference mode (prediction). So if you train without the reverse scaling, the later layers after the dropout layers will not be trained to expect the same amount of “energy” from the previous layers.

This question has been asked a number of times before. Here’s a good thread to read, which also points to this one.

Topic		Replies	Views
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	671	May 24, 2022
Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	22	1796	July 27, 2023
Why do you divide the activations by keep_prob when you use drop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	726	May 22, 2023
Expected Value Stays the Same After Scaling In Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	561	October 4, 2021
[C2W1] Dropout Regularization - Lecture issue Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	539	January 11, 2022

In dropout why we want to maintain expected value of a[l]

Related topics