Based on the backprop equations from Course 1, if `i`

th neuron of the `l-1`

th layer is dropped out, does that mean that the `i`

th column of `dW^[l]`

is equal to zero? If so, does this mean that the weights associated with neurons whose activations are dropped out are unchanged during that particular weight update?

Hi @LuBinLiu,

Your intuition is mostly correct, although is not that dW^[l] is zeroed, more like itâ€™s ignored.

I see it like this: Every iteration of training, instead of a full network, youâ€™ll have a thinned version of it (eg. a sub-network) which is the result of â€śignoringâ€ť a few connections (based on the dropout rate) from the full network. Only the units in this thinned version of the network participate in both forward and back propagation, meaning that the weights and biases of units that were ignored will not get updated in the training step.

In each training step (ie. mini-batch), youâ€™ll have a different thinned version of the full network.

In testing, youâ€™ll use the full network.

I recommend you take a look at the original paper, which reads nicely!

Dropout: A Simple Way to Prevent Neural Networks from

Overfitting by Srivastava, Hinton, Krizhevsky, Sutskever and Salakhutdinov.

Hope that helps!

From the paper it seems like dropout is elementwise multiplying the output activations of a layer by a vector r of i.i.d. Bernoulli r.v. (i.e. \tilde a^{[l-1]} = r * a^{[l-1]}). Does this mean that can I re-use the equations from the image I posted, replacing, all instances of a with \tilde a (i.e. dW^{[2]} = dz^{[2]}\tilde a^{[1]^T} or dz^{[2]}=\tilde a^{[2]}-y)? If so, computationally, does this mean that the i th column of dW^{[2]} is 0 if the i th element of \tilde a^{[1]^T} is 0 (dropped out)?

Yes, you could! Given that r could be 0 or 1 with some probability p, then some activations will be zero and not count towards the forward propagation. Having said that, I donâ€™t think the tests in the assignment will pass as they donâ€™t expect this. Iâ€™d still give it a try and see how it goes.

Indeed. The for the weights belonging to dropped units, the corresponding dW^{[1]} would be 0, hence the weights not being updated for those dropped units during that step.

l â†©ď¸Ž