Does Dropout implementation shut down random features and not random neurons?

From the week1 video lecture on dropout, what I understand is that we’ll shut down random neurons. Shutting down a neuron means setting the neuron value to zero for all examples in the training set.

For instance, if we have 3 examples with 2 neurons in layer l, A[l].shape=(2,3). A will look like
[[ A11 A12 A13],
[A21 A22 A23]]

My assumption was to generate a keep_prob.shape=(2,1) matrix and apply it to each row. For instance, keep_prob =[[0],[1]]. Then multiplying A with keep_prob, with broadcasting will result in:
[[0 0 0],
[A21 A22 A23]].

But in practice, in the second assignment, the implementation is such that we generate a random value per feature per neuron. For instance, A can look like:
[[0 A12 A13],
[0 A22 0]]

Therefore, the neurons we shut down are different from one example to another.

Did I understand this correctly? If so, I was hoping to understand better why we don’t shut down a neuron for all examples?


Yes, this is correct. For each training example, you sample a thinned network by randomly dropping out neurons.

You’re trying to approximate the effect of averaging the predictions of many different neural networks to prevent overfitting (source), so you wouldn’t want to use the same network for all the training examples :slight_smile: