Question about weight updates for dropped out neurons

Based on the backprop equations from Course 1, if ith neuron of the l-1th layer is dropped out, does that mean that the ith column of dW^[l] is equal to zero? If so, does this mean that the weights associated with neurons whose activations are dropped out are unchanged during that particular weight update?

Capture

Hi @LuBinLiu,

Your intuition is mostly correct, although is not that dW^[l] is zeroed, more like it’s ignored.
I see it like this: Every iteration of training, instead of a full network, you’ll have a thinned version of it (eg. a sub-network) which is the result of “ignoring” a few connections (based on the dropout rate) from the full network. Only the units in this thinned version of the network participate in both forward and back propagation, meaning that the weights and biases of units that were ignored will not get updated in the training step.

In each training step (ie. mini-batch), you’ll have a different thinned version of the full network.

In testing, you’ll use the full network.

I recommend you take a look at the original paper, which reads nicely!

Dropout: A Simple Way to Prevent Neural Networks from
Overfitting
by Srivastava, Hinton, Krizhevsky, Sutskever and Salakhutdinov.

Hope that helps!

2 Likes

From the paper it seems like dropout is elementwise multiplying the output activations of a layer by a vector r of i.i.d. Bernoulli r.v. (i.e. \tilde a^{[l-1]} = r * a^{[l-1]}). Does this mean that can I re-use the equations from the image I posted, replacing, all instances of a with \tilde a (i.e. dW^{[2]} = dz^{[2]}\tilde a^{[1]^T} or dz^{[2]}=\tilde a^{[2]}-y)? If so, computationally, does this mean that the i th column of dW^{[2]} is 0 if the i th element of \tilde a^{[1]^T} is 0 (dropped out)?

Yes, you could! Given that r could be 0 or 1 with some probability p, then some activations will be zero and not count towards the forward propagation. Having said that, I don’t think the tests in the assignment will pass as they don’t expect this. I’d still give it a try and see how it goes.

Indeed. The for the weights belonging to dropped units, the corresponding dW[1] would be 0, hence the weights not being updated for those dropped units during that step.


  1. l ↩︎

2 Likes