So on the second module assignment which is about dropout, I did the following changes:
In the forward_propagation_with_dropout we have the mask:
D1 = np.random.rand(A1.shape[0], 1)
and I changed D2 for the next layer as well as D1.
Coding Details
If A1 shape is (n, m) for example, n = 5, m=3:
d = np.random.rand(5, 1) < 0.5
d = d.astype(int)
We end up with something like:
array([[0],
[0],
[1],
[0],
[0]])
Thanks to the Python Broadcasting A1 * d:
array([[0.28123521, 0.25551862, 0.25786374],
[0.297053 , 0.0089446 , 0.98717618],
[0.56646064, 0.60184241, 0.53826007],
[0. , 0. , 0. ],
[0.67295579, 0.534168 , 0.05507231]])
So we are shutting off all the samples for the same neuron.
Since D1 and D2 caches to the backpropagation function, we don’t need to change anything.
hope I got that right! this minor change is so simple.
I would say the two methods almost do the same thing until I notice something.
Here is the result of the Masked version (np.random.rand(A1.shape[0], 1)) compared to the original version (np.random.rand(*A1.shape)) with keeping the other things fixed except the keep_prob property:
Original Method:
details
keep_prob = 0.86
Cost after iteration 0: 0.6543912405149825
Cost after iteration 10000: 0.0610169865749056
Cost after iteration 20000: 0.060582435798513114

On the train set: Accuracy: 0.9289099526066351
On the test set: Accuracy: 0.95
Masked Method:
details
keep_prob = 0.91
Cost after iteration 0: 0.690437069943951
Cost after iteration 10000: 0.17615076457892637
Cost after iteration 20000: 0.16678299596707952

On the train set: Accuracy: 0.9383886255924171
On the test set: Accuracy: 0.96
So yeah… as we expected this method is more intense and needs to make keep_prob higher, meaning shuts less of the neurons off.
Minding one factor (randomization) could affect the results I noticed it was so much easier for me to find the hyperparameter keep_prob for the masked version. Hence I did many different runs to the program to find better keep_prob for both methods and make sure of it. Assuming the who wrote the assignment, found the best value for keep_prob before, for me even finding the value 0.86 was more difficult. So not only the result is better (at least a little) (On this dataset), finding the right value for keep_prob is easier too.
Here is the result of the different values of the keep_prob on both methods:
Note I did this a few times and I find a similar result.
accuracy on 30 different values of the keep_prob
Original version
# keep_prob: (on train set, on dev set)**
0.8: (0.919431279620853, 0.93),
0.8037931034482759: (0.9146919431279621, 0.92),
0.8075862068965518: (0.919431279620853, 0.925),
0.8113793103448276: (0.933649289099526, 0.945),
0.8151724137931035: (0.9383886255924171, 0.945),
0.8189655172413793: (0.9383886255924171, 0.94),
0.8227586206896552: (0.9383886255924171, 0.95),
0.8265517241379311: (0.933649289099526, 0.925),
0.830344827586207: (0.9289099526066351, 0.925),
0.8341379310344827: (0.9289099526066351, 0.93),
0.8379310344827586: (0.9289099526066351, 0.925),
0.8417241379310345: (0.9241706161137441, 0.92),
0.8455172413793104: (0.933649289099526, 0.925),
0.8493103448275863: (0.9383886255924171, 0.93),
0.8531034482758622: (0.9289099526066351, 0.93),
0.8568965517241379: (0.9289099526066351, 0.95),
0.8606896551724138: (0.9289099526066351, 0.95),
0.8644827586206897: (0.919431279620853, 0.94),
0.8682758620689656: (0.933649289099526, 0.94),
0.8720689655172414: (0.9289099526066351, 0.935),
0.8758620689655172: (0.9241706161137441, 0.935),
0.8796551724137931: (0.919431279620853, 0.945),
0.883448275862069: (0.9146919431279621, 0.945),
0.8872413793103449: (0.919431279620853, 0.94),
0.8910344827586207: (0.933649289099526, 0.95),
0.8948275862068966: (0.909952606635071, 0.935),
0.8986206896551725: (0.9052132701421801, 0.92),
0.9024137931034483: (0.9289099526066351, 0.955),
0.9062068965517242: (0.933649289099526, 0.95),
0.91: (0.9289099526066351, 0.925)
# on average:
# (0.92685624, 0.936 )
Mask version
# keep_prob: (on train set, on dev set)**
0.85: (0.9383886255924171, 0.955),
0.8537931034482759: (0.9383886255924171, 0.96),
0.8575862068965517: (0.957345971563981, 0.93),
0.8613793103448275: (0.9620853080568721, 0.935),
0.8651724137931034: (0.957345971563981, 0.935),
0.8689655172413793: (0.9620853080568721, 0.935),
0.8727586206896552: (0.9620853080568721, 0.935),
0.876551724137931: (0.9289099526066351, 0.945),
0.8803448275862069: (0.9289099526066351, 0.945),
0.8841379310344827: (0.9289099526066351, 0.945),
0.8879310344827586: (0.9289099526066351, 0.945),
0.8917241379310344: (0.9289099526066351, 0.945),
0.8955172413793103: (0.9289099526066351, 0.945),
0.8993103448275862: (0.9289099526066351, 0.945),
0.9031034482758621: (0.9289099526066351, 0.95),
0.9068965517241379: (0.9289099526066351, 0.95),
0.9106896551724137: (0.9383886255924171, 0.96),
0.9144827586206896: (0.9289099526066351, 0.96),
0.9182758620689655: (0.9383886255924171, 0.955),
0.9220689655172414: (0.9383886255924171, 0.96),
0.9258620689655173: (0.943127962085308, 0.955),
0.929655172413793: (0.933649289099526, 0.955),
0.9334482758620689: (0.9383886255924171, 0.955),
0.9372413793103448: (0.9383886255924171, 0.955),
0.9410344827586207: (0.9478672985781991, 0.95),
0.9448275862068966: (0.933649289099526, 0.955),
0.9486206896551723: (0.933649289099526, 0.955),
0.9524137931034482: (0.9289099526066351, 0.955),
0.9562068965517241: (0.9383886255924171, 0.96),
0.96: (0.9289099526066351, 0.95)
# on average:
# (0.9382306477093206, 0.949333333333333)