(this was previously posted on the forum, then moved here)

When using inverted dropout, we follow these steps:

*1. Compute dropout matrix D*

```
D = matrix of same shape as A,
with zeros and ones (where ones have keep_prob probability)
```

*2. Update Activation matrix by zeroing some elements*

```
A = A[Where D1 is 1, else 0]
```

*3. Rescale the activation matrix*

```
A = A / keep_prob
```

My question: would it not be more accurate to divide by the actual “activation reduce factor for this layer”; i.e if we dropped 3 out of 12 activations, then we should scale by a factor of 12/9.

For example, this should be written as

```
A = A * A.size / np.sum(D)
```

I tried it in the Week 1 Regularization assignment, and my results are similar (i.e 92.45% on the training set, and 95% on the test set).

However, I suspect that this method might be a little more precise especially when keep_prob is closer to 1:

Below, is the output when I change keep_prob to 0.9 with the original factor (`1/keep_prob`

)

And below is the output when I use the proposed factor (`A.size / np.sum(D)`

)