Week 1: dropout vs reducing network?

paulinpaloalto · January 30, 2023, 3:47pm

I think your interpretation of why dropout works is incorrect. It is not the fact that we are reducing the activation outputs that is the point: it is the subtle weakening of the dependence of a given neuron on the specific inputs from the previous layer. The point is (as described in the lectures) that you are sampling a different slightly reduced network on every iteration and on every training sample in the batch. This stochastic effect of weakening the connections is what reduces the overfitting. But note that when we actually apply the trained network to make a prediction, dropout is no longer used: we simply use the trained network. That is true of all forms of regularization: they are only applied during training, not during inference. So if we don’t compensate for the reduced “expected value” of the activations, then the network in inference mode will not work as well because it’s been trained to expect less total activation value but it gets values from all the neurons in inference mode.

This question comes up frequently. Here’s another thread worth a look for this particular question. And here’s another one.

Topic		Replies	Views
Network size and bias variance tradeoff Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	652	April 26, 2021
Why Regularization Reduces Over Fitting Lecture Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	496	April 9, 2022
Bias Variance tradeoff Improving Deep Neural Networks: Hyperparameter tun coursera-platform	10	998	September 24, 2021
[Course 2] Regularization effect with Smaller NN Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	551	August 7, 2022
How does increasing the neural networks layers (making it bigger) help reduce the bias? Advanced Learning Algorithms week-module-2	3	344	October 5, 2023

Week 1: dropout vs reducing network?

Related topics