In the exercise, it is mentioned that I should initialize D1 in the following way

D1 = np.random.rand(A1.shape[0],A1.shape[1])

D1 = (D1<keep_prob).astype(int)

But my doubt is that what if I initialize D1 in following way

D1 = np.random.rand(A1.shape[0],A1.shape[1])

D1 = (D1>(1-keep_prob)).astype(int)

Will it make any differnece to my model??

This is an interesting question! The point is that those two implementations have the same *statistical* behavior in terms of how many nodes are zeroed, but the actual nodes that get zeroed are different, right? But it turns out that the test cases here are written to expect that you use the first method.

Since all the behavior of dropout is fundamentally statistical, either of the implementations will have the same overall effect in actual use for training a model. But only the first one will pass the grader for this assignment.

1 Like