Backward_Propagation_With_Dropout

Nirvana_Laha · May 24, 2021, 6:23pm

Hi,

While using inverted dropout for regularization, during every iteration of gradient descent we are already dropping out and scaling activations during the forward pass. Why is it needed to explicitly set the switched off dAs to 0 and then scale the dAs in the backward pass? Shouldn’t it automatically happen as a result of the dropouts in the forward pass?

My logic:

If we implement inverted dropout on AL then shouldn’t dAL already have the corresponding dropped elements zero’d and the remaining elements scaled up? After all, dAL is the gradient of AL.

Would be great if somebody could clarify this please!

Jaskeerat · May 24, 2021, 7:09pm

Hey! In Forward prop, indeed neurons are zeroed out. But it is important to consider how it is implemented. We zero out the effect of the neurons, not by actually removing them, but by multiplying the activations with a matrix of the same dimensions with some 0s. This makes some activations 0 as if the neurons had been turned off. Since we didn’t really change the network architecture and those neurons still exist, in backward propagation, we once again have to explicitly zero out the gradients for the neurons which were initially turned off.

d[L]=(np.random.randn(a[L].shape[0], a[L].shape[1])<keep_prob) #d[L] is now vector of 0s and 1s based on keep prob for layer L.

a[L]=np.multiply(a[L],d[L]) #here we multiply all calculated activations with d[L] as if the neurons of activations which give 0 have been turned off.

Logically, it is natural to think if we turned of neurons, they should stay turned off for the backprop, but the key here is to understand we aren’t actually eliminating/turning them off, only simulating that effect by multiplying by a matrix with some random 0s based on keep_prob.

The reason why we simulate turning the neurons off by multiplying and not truly remove them is because in the test time, we actually need all neurons to be turned on.

Nirvana_Laha · May 25, 2021, 11:38am

Hi,

Thanks for the prompt reply!

I think I understand it now. Even though Al may be set to zero, it doesn’t mean that dAl will be zero as a result. The value of dAl will depend on activations of layers to its right (As calculated via chain rule during backprop) and will not work out to be zero just because Al has been set to 0. Hence to simulate the effect of switching off the neuron we have to explicitly set it to zero.

Is that the correct understanding?

Jaskeerat · May 25, 2021, 12:25pm

You’re absolutely right!

Nirvana_Laha · May 25, 2021, 1:53pm

Great, thanks for clearing that up!

Topic		Replies	Views
Gradients with dropout Improving Deep Neural Networks: Hyperparameter tun	5	631	July 28, 2023
Dropout regularization Improving Deep Neural Networks: Hyperparameter tun	1	674	May 10, 2021
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun	3	815	May 10, 2021
Backpropagation when using dropout and Regularization Improving Deep Neural Networks: Hyperparameter tun	5	587	February 11, 2022
Back prop with drop out Improving Deep Neural Networks: Hyperparameter tun	2	497	March 1, 2023

Backward_Propagation_With_Dropout

Related topics