Implementing dropout regularization

X0450 · May 14, 2022, 6:11am

Hello,

I wonder if someone could explain why d3 is np.random.rand(a3.shape[0],a3.shape[1]) followed by d3 times a3 rather than d3 =np.random.rand(a3.shape[0],1) and then d3 times a3 please. In other words, if the first neural in layer 3 drops out, I assume first row of resulting A3 is all 0 rather than a mix of 1 and 0 with some probability. Could someone point out where I go wrong please? Thank you in advance.

Elemento · May 14, 2022, 6:52am

Hey @X0450,
First of all, I would like to point out that in the assignment, the dropout needs to be added in the first and second layers only. Do review the note written in Exercise 3 once. Now, let’s understand your query with the help of the example provided in the kernel itself. Here, I will be considering the example of adding dropout in the first layer. Consider the shapes of the variables as follows:

X → (3, m); where m denotes the batch size
W1 → (2, 3)
b1 → (2, 1)
Z → (2, m)
A1 → (2, m)

Now, the key-point to note here is that a single column in A1 denotes a single example, and what we want is to apply a unique dropout to each of the examples, i.e., for examples in a single batch, we want the neurons to be turned-off differently.

Now, when we take D1 = np.random.rand(A1.shape[0], A1.shape[1]), it ensures the desired effect, i.e., different neurons are turned-off for different examples, even in a single batch. But if we take D1 = np.random.rand(A1.shape[0], 1), then D1 will have a shape (2, 1), and it will be broadcasted to (2, m), so that D1 and A1 can be multiplied. In this case, the same neurons are turned off for each of the examples, which is different from the desired outcome. I hope this helps.

Regards,
Elemento

X0450 · May 14, 2022, 8:44am

That makes sense. Thank you very much!

paulinpaloalto · May 14, 2022, 3:11pm

This is an interesting question. Elemento has given a great answer, but there are some earlier threads which also discuss this point. The idea is that either solution is actually reasonable: using the same mask for each sample in the minibatch or using a unique pattern for each sample. Here’s a thread on which a fellow student does some experiments comparing the two methods and shows the results.

Topic		Replies	Views
Dropout regularization exercise Improving Deep Neural Networks: Hyperparameter tun	6	571	June 12, 2022
Doubt regarding D1 and D2 in dropout regularization in assignment 2 of week 1 Improving Deep Neural Networks: Hyperparameter tun	1	537	November 29, 2021
W1 - Exercise 3 Improving Deep Neural Networks: Hyperparameter tun	3	556	January 18, 2022
Inverted dropout, killing nodes or stabbing training examples? Improving Deep Neural Networks: Hyperparameter tun	9	1041	May 15, 2022
[C2W1] Dropout Regularization - Lecture issue Improving Deep Neural Networks: Hyperparameter tun	2	539	January 11, 2022

Implementing dropout regularization

Related topics