Inverted dropout Intuition?

mahsa_zarei · May 24, 2022, 4:13am

Hi,
I find these sentences in the videos and text about why we need to divide the activation function of each layer by the corresponding layer’s keep_prob. Such as:
In order to not reduce the expected value of A[L]…
You’re assuring the result of the cost will still have the same expected value as without dropout… DOESN’T COST CHANGE IN EACH ITERATION?

Still, I am not 100% sure that I understand it deeply. I think that when we remove 20% of the weights in the case when keep_prob is .8 and then multiply the neuron values by 10/8 to scale up the remaining weights. But I am not sure why this is necessary?
Thanks

anon57530071 · May 24, 2022, 5:12am

If we remove 20% of neuron units with keep_prob = 0.8, then, total amount output from “remaining” units is 80% since 20% of units do not provide any. If we use 5 hidden layers and set keep_prob = 0.8, then, the final output is 0.8^5*(original\ output) = 0.32768 * (original\ output). If we add more layers, then, we will lost the output. To avoid this situation, we want to keep the amount of outputs of a single layer equal to the original one. In this sense, ((original\ output)*0.8)/0.8 can keep the amount of total outputs even some of units do not work with Dropout.

paulinpaloalto · May 24, 2022, 5:48pm

The point is that (as Nobu showed) we are eliminating a certain percentage of the outputs of the layer on each iteration. But once the network is fully trained and we are using it in “prediction” mode, we will not be doing dropout at all. So we want the next and subsequent layers to be trained on the standard amount “activation energy” that the layer is producing. That is why we compensate for the dropped neurons in that way.

This point has been discussed before. Here’s a pretty long thread that goes over these points and even illustrates the point about the norm of the output activations with actual examples (but you need to read all the way through, not just the first couple of posts).

anon57530071 · May 24, 2022, 6:26pm

Paul, I appreciate your addition. I forgot to describe the differences between the training time and the prediction time which is the main objective of this operation.

Topic		Replies	Views
Doubt related to Inverted Dropout Technique Improving Deep Neural Networks: Hyperparameter tun	2	820	February 16, 2023
Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun	22	1783	July 27, 2023
Regularization by Inverted Dropout Improving Deep Neural Networks: Hyperparameter tun	1	687	August 12, 2021
A lecture issue in dropout regularization implementation in week 1 Improving Deep Neural Networks: Hyperparameter tun	7	713	December 9, 2022
Why do you divide the activations by keep_prob when you use drop Improving Deep Neural Networks: Hyperparameter tun	7	714	May 22, 2023

Inverted dropout Intuition?

Related topics