C2W1 Dropout Regularization clarification

This is with reference to the “Forward Propagation with Dropout” instructions provided in C2W1 Regularization exercise.

The Exercise instructions states the following -

Hint: Let’s say that keep_prob = 0.8, which means that we want to keep about 80% of the neurons and drop out about 20% of them. We want to generate a vector that has 1’s and 0’s, where about 80% of them are 1 and about 20% are 0. This python statement:
X = (X < keep_prob).astype(int)

is conceptually the same as this if-else statement (for the simple case of a one-dimensional array) :

for i,v in enumerate(x):
    if v < keep_prob:
        x[i] = 1
    else: # v >= keep_prob
        x[i] = 0

Note that the X = (X < keep_prob).astype(int) works with multi-dimensional arrays, and the resulting output preserves the dimensions of the input array.

My Queries :

  • The above python statement might not always result in providing approx 20% 0s and 80% 1s. As we are initializing the D1 with np.random.rand, which is providing “uniform distribution” between 0 and 1. but the values might not exactly be always contribute to approx 80% 1s. Am i missing something here ?
  • I tried to experiment this with the following code and the results seems to be that the percentage ones differs widely aligning with my understanding.

keep_prob = 0.8 # so we need approx 80% of X should be 1s , 20% zeros

for i in range(1,10):
    # X is similar to D1
    X = np.random.rand(2,3)
    # convert entries of X to 0 or 1 (using keep_prob as the threshold)
    X = (X < keep_prob).astype(int)
    # Count the number of 1s in the matrix
    num_ones = np.count_nonzero(X == 1)
    # Calculate the percentage of 1s in the matrix
    percentage_ones = (num_ones / X.size) * 100
    print("Percentage of 1s in the matrix: {:.2f}%".format(percentage_ones))

I have searched and read responses on Dropout regularization questions in the forum but still am not clear. Appreciate your help.

PS: There is no exercise answer code here.

Hi @ssvinoth,

Try to replace X = np.random.rand(2,3) with X = np.random.rand(200,300) and share with us whether you find it approximately 80%?

The replacement is about the number of generated random numbers. Based on your observation on that (and any other) replacement, can you conclude when that algorithm will more often produce favourable outcome and when less often?


Thanks @rmwkwok . You are right, it seems when the size of the X is bigger ( in my experiments anything more than 20,20 size ) the distribution works i.e the percentage ones is approx 80%. So the statement works as intended in the exercise :slight_smile:

Hello @ssvinoth,

So you found it! Great work experimenting!

You see, probability is really a statistical thing. When our layer has only one weight, i.e. X = np.random.rand(1,1), that weight can either be turned ON or turned OFF, so the observed percentage is always either 100% or 0%, and never be 80%. Similarly, if we have only two weights, it can either be 0%, 50%, or 100%.

However, even if there is only one weight, if we observe such kind of case for many times, say, we have 100 layers and each layer has one weight, then it is quite likely that approximately 80% of time those weights are turned ON.

Now that we know the observed percentage is more likely to be approximately the required keep_probwhen we are counting more weights, is that sufficiently good? Well, then we think about when we will usually use Dropout.

We use Dropout for regularization, and we use regularization when our model is overfitting, and our model is more likely to be overfitting when our neural network is too large. See? Too large. When it is too large, you have more weights to count, and so it is more likely the observed percentage is approximately the required keep_prob.