Question about weight updates for dropped out neurons

LuBinLiu · August 10, 2021, 4:25am

Based on the backprop equations from Course 1, if ith neuron of the l-1th layer is dropped out, does that mean that the ith column of dW^[l] is equal to zero? If so, does this mean that the weights associated with neurons whose activations are dropped out are unchanged during that particular weight update?

Capture

neurogeek · August 10, 2021, 2:43pm

Hi @LuBinLiu,

Your intuition is mostly correct, although is not that dW^[l] is zeroed, more like it’s ignored.
I see it like this: Every iteration of training, instead of a full network, you’ll have a thinned version of it (eg. a sub-network) which is the result of “ignoring” a few connections (based on the dropout rate) from the full network. Only the units in this thinned version of the network participate in both forward and back propagation, meaning that the weights and biases of units that were ignored will not get updated in the training step.

In each training step (ie. mini-batch), you’ll have a different thinned version of the full network.

In testing, you’ll use the full network.

I recommend you take a look at the original paper, which reads nicely!

Dropout: A Simple Way to Prevent Neural Networks from
Overfitting by Srivastava, Hinton, Krizhevsky, Sutskever and Salakhutdinov.

Hope that helps!

LuBinLiu · August 10, 2021, 5:24pm

From the paper it seems like dropout is elementwise multiplying the output activations of a layer by a vector r of i.i.d. Bernoulli r.v. (i.e. \tilde a^{[l-1]} = r * a^{[l-1]}). Does this mean that can I re-use the equations from the image I posted, replacing, all instances of a with \tilde a (i.e. dW^{[2]} = dz^{[2]}\tilde a^{[1]^T} or dz^{[2]}=\tilde a^{[2]}-y)? If so, computationally, does this mean that the i th column of dW^{[2]} is 0 if the i th element of \tilde a^{[1]^T} is 0 (dropped out)?

neurogeek · August 10, 2021, 8:34pm

Yes, you could! Given that r could be 0 or 1 with some probability p, then some activations will be zero and not count towards the forward propagation. Having said that, I don’t think the tests in the assignment will pass as they don’t expect this. I’d still give it a try and see how it goes.

Indeed. The for the weights belonging to dropped units, the corresponding dW^[1] would be 0, hence the weights not being updated for those dropped units during that step.

l ↩︎

Topic		Replies	Views
Week 1 -Possible Mistake on Lecture Video? Improving Deep Neural Networks: Hyperparameter tun week-1	4	32	March 4, 2025
A doubt on dropout Improving Deep Neural Networks: Hyperparameter tun	4	518	August 17, 2023
Backward_Propagation_With_Dropout Improving Deep Neural Networks: Hyperparameter tun	7	667	March 27, 2025
A lecture issue in dropout regularization implementation in week 1 Improving Deep Neural Networks: Hyperparameter tun	7	713	December 9, 2022
Drop out gradient (scaled again) Improving Deep Neural Networks: Hyperparameter tun	1	520	August 23, 2021

Question about weight updates for dropped out neurons

Related topics