The dropout is supposed to drop nodes in the hidden layer randomly for each iteration. Are we not supposed to drop the same node for all m inputs in the training set?
The programming exercise required me to initialized the dropout matrix for the first hidden layer as follows:
D1 = np.random.rand(A1.shape[0],A1.shape[1])
D1 = (D1 < keep_prob).astype(int)
This would cause different nodes to be dropped for the “m” inputs. If we should drop the same node for all m examples, we should initialize D1 as a Vector of shape (A1.shape[0], 1) and use broadcasting to multiple A1 and D1.
I tried to run my code by initializing D1 as a vector and using broadcasting, but the test cases were failing.
Please clarify.
Thanks,
Manish
1 Like
Hi @manish_tiwari , the idea in dropout is turn off nodes randomly for each sample.
If you turn of the same nodes for all m inputs, you will be optimizing a different NN, a smaller one. So your NN performance will decrease significantly.
Even if you change the dropout nodes once per epoch, the end result will be not as good as selecting the random nodes for each example.
I have have the same question as @manish_tiwari, since it mentioned “When you shut some neurons down, you actually modify your model” in the “Regularization” assignment description.
Regarding the performance, I tried to implement using:
D1 = np.random.rand(A1.shape[0],1)
D2 = np.random.rand(A2.shape[0],1)
by using keep_prob = 0.9, learning_rate = 0.3, the result is actually not worse with:
On the train set:
Accuracy: 0.933649289099526
On the test set:
Accuracy: 0.955
Yes, this issue has been noticed and discussed a number of times. Here’s a thread which shows more investigations which come to the same conclusion that you show: it probably doesn’t make all that much difference if you do the dropout differently per sample or consistently per sample in a given minibatch.