Week 1 Exercise 6 - test doesn't pass

Hi all,

My code runs but doesn’t output the correct answer. Any clue? Thanks in advance for your help!

The question: forward_propagation_with_dropout

Implement the forward propagation with dropout. You are using a 3 layer neural network, and will add dropout to the first and second hidden layers. We will not apply dropout to the input layer or output layer.

Instructions: You would like to shut down some neurons in the first and second layers. To do that, you are going to carry out 4 Steps:

  1. In lecture, we dicussed creating a variable 𝑑[1]

with the same shape as 𝑎[1] using np.random.rand() to randomly get numbers between 0 and 1. Here, you will use a vectorized implementation, so create a random matrix 𝐷[1]=[𝑑1𝑑1…𝑑1] of the same dimension as 𝐴[1]* .

  • Set each entry of 𝐷[1]
  1. to be 1 with probability (keep_prob), and 0 otherwise.

Hint: Let’s say that keep_prob = 0.8, which means that we want to keep about 80% of the neurons and drop out about 20% of them. We want to generate a vector that has 1’s and 0’s, where about 80% of them are 1 and about 20% are 0. This python statement:
X = (X < keep_prob).astype(int)

is conceptually the same as this if-else statement (for the simple case of a one-dimensional array) :

for i,v in enumerate(x):
    if v < keep_prob:
        x[i] = 1
    else: # v >= keep_prob
        x[i] = 0

Note that the X = (X < keep_prob).astype(int) works with multi-dimensional arrays, and the resulting output preserves the dimensions of the input array.

Also note that without using .astype(int), the result is an array of booleans True and False, which Python automatically converts to 1 and 0 if we multiply it with numbers. (However, it’s better practice to convert data into the data type that we intend, so try using .astype(int).)

  1. Set 𝐴[1]

to 𝐴[1]∗𝐷[1]. (You are shutting down some neurons). You can think of 𝐷[1]* as a mask, so that when it is multiplied with another matrix, it shuts down some of the values.

  • Divide 𝐴[1]
    by keep_prob. By doing this you are assuring that the result of the cost will still have the same expected value as without drop-out. (This technique is also called inverted dropout.)

My answer

{moderator edit - solution code removed}

Notice that the shapes you are using for “mask” matrices are derived from the bias vectors, so you will be treating each column of A the same way because of “broadcasting”. That’s not what the instructions intended: they tell you to make the shape of the mask directly the shape of the corresponding A value, which means that each sample will be treated differently in terms of which neurons are dropped.

This is an interesting point to consider: does it make sense to treat each sample the same within a given minibatch or not? This topic has come up a number of times before and here’s a thread which discusses this in more detail and actually shows some experiments comparing the results of the two methods of implementing dropout.

Regardless of the conclusion of any research on the alternatives, in order to pass the tests here you have to make your code conform to the “each sample is handled differently” strategy.