From the week1 video lecture on dropout, what I understand is that we’ll shut down random neurons. Shutting down a neuron means setting the neuron value to zero for all examples in the training set.
For instance, if we have 3 examples with 2 neurons in layer l, A[l].shape=(2,3). A will look like
[[ A11 A12 A13],
[A21 A22 A23]]
My assumption was to generate a keep_prob.shape=(2,1) matrix and apply it to each row. For instance, keep_prob =[[0],[1]]. Then multiplying A with keep_prob, with broadcasting will result in:
[[0 0 0],
[A21 A22 A23]].
But in practice, in the second assignment, the implementation is such that we generate a random value per feature per neuron. For instance, A can look like:
[[0 A12 A13],
[0 A22 0]]
Therefore, the neurons we shut down are different from one example to another.
Did I understand this correctly? If so, I was hoping to understand better why we don’t shut down a neuron for all examples?