The dropout is supposed to drop nodes in the hidden layer randomly for every example (i), every iteration/epoch of gradient descent. However, the programming exercise has this line of code:
np.random.seed(1)
in the function forward_propagation_with_dropout(). This would force every iteration of gradient descent to output same set of random number results for D1 and D2. Then we are essentially dropping the same set of nodes in every iteration.
I tried to run my code without this line, but the test cases were failing.
Thanks for getting back to me! Yes, I get that the seed is used for reproducibility.
However, this function forward_propagation_with_dropout() is within a for loop of gradient descent in the function model(). During every iteration, the seed would be reset, so the resulting D1 and D2 would be the same across all iterations. Within 1 iteration, indeed random nodes are dropped for different examples in different layers, but this pattern is repeated across all the iterations.
In my opinion, the line np.random.seed(1) should instead be placed at the beginning of function model(), before we go into the for loop of gradient descent. Then D1 and D2 would be generated randomly across different iteration according to a known distribution.
For illustration purpose please see the 2 code blocks below.
{1}
for i in range(5):
np.random.seed(1)
print(np.random.rand())
{2}
np.random.seed(1)
for i in range(5):
print(np.random.rand())
In example {1}, all the outputs are identical, which is the same situation as the current implementation of dropout code. In example {2}, after we move the seed() line outside the for loop, the outputs are no longer identical.
I’m not sure if I’m missing anything here or I got the concept of dropout wrong. Please let me know.
I think you’re right. In every iteration D1 and D2 should vary.
To be sure we can print out D1 and D2 few iterations.
If this is the case, I can report it.
@tangw: Thanks for pointing this out! It is a really good point that I had not realized until just now. They probably put it there in the forward prop routine so that the test case for that one function would be consistent, but the easier way to achieve that would have been to set the seed in the test case routine itself.
So this means that all the training that we do using the code as written is not really showing the full power of dropout for regularization. Of course in a “real world” system, you would never set the seeds in any case. That would at most and only be done for testing purposes.
Thanks Bahadir. I have actually tried printing out some entries of D1 and D2 for all the iterations and the outputs are identical. Please let me know if you find the same thing.
I have successfully completed the assignment by now. After submitting, I went back and reran the code without setting the seeds. The resulting train accuracy is 94.8% and test accuracy is 93.5%, compared to before train accuracy of 92.9% and test accuracy of 95%.
Another point is that the cost chart (cost vs. iterations x1000) is no longer strictly trending down if I don’t set the seeds. This is consistent with what Prof said during lecture video, as the cost function is different across each iteration due to dropout. We might want to turn off the dropout and plot the cost chart to see if the gradient descent is working properly to minimize the cost.
Yes, I understand that this is done mostly for testing purposes. In real world system we would never set the seeds.