From the week1 video lecture on dropout, what I understand is that we’ll shut down random neurons. Shutting down a neuron means setting the neuron value to zero for all examples in the training set.

For instance, if we have 3 examples with 2 neurons in layer l, A[l].shape=(2,3). A will look like

[[ A11 A12 A13],

[A21 A22 A23]]

My assumption was to generate a keep_prob.shape=(2,1) matrix and apply it to each row. For instance, keep_prob =[[0],[1]]. Then multiplying A with keep_prob, with broadcasting will result in:

[[0 0 0],

[A21 A22 A23]].

But in practice, in the second assignment, the implementation is such that we generate a random value per feature per neuron. For instance, A can look like:

[[0 A12 A13],

[0 A22 0]]

Therefore, the neurons we shut down are different from one example to another.

Did I understand this correctly? If so, I was hoping to understand better why we don’t shut down a neuron for all examples?