Confusion about dimension of Dropout matrix

ekylberg · January 4, 2022, 3:52am

I had quite a bit of trouble with the forward_propagation_with_dropout function, until I realized I was providing the wrong dimensions for D1 and D2. My initial assumption was that the D1 and D2 would be single-dimensional vectors, eg (A2.shape[0], 1) , not 2-dimensional arrays.

However, given the supporting materials, I’m confused why this is the expectation in the assignment. At numerous points in the lecture and in the notebook, dropout was described as “removing hidden units”, and all illustrations pointed to dropout networks being dense. But, if D1 and D2 are 2-dimensional arrays matching the dimension of the weight matrices, isn’t Dropout actually removing edges and not nodes from the network?

If my question is unclear, please let me know, and I will try to provide a visual diagram.

paulinpaloalto · January 4, 2022, 9:00pm

It is a good question and an interesting point. I forget exactly what Prof Ng says in the lectures, but it is quite clear in the instructions for the assignment that he expects the mask values to be matrices, not vectors. So the effect is that we treat each sample in a given minibatch differently w.r.t. dropout. We are still removing nodes, but different ones for each sample. So it’s effectively as if we were doing Stochastic Gradient Descent w.r.t. the way dropout is applied. My intuition is that this makes the effect weaker for a given keep_prob value. There have been some interesting past discussions of this point, e.g. this one. Please have a look at that discussion and see if it sheds any further light.

ekylberg · January 4, 2022, 10:02pm

From reading that topic, it seems to me that the poster is confused about what dimensions in the weight matrices correspond to. He’s associating the columns of the weight matrix with training examples, when in actuality the columns are the weights applied to the node outputs on the previous layer. I think he’s mixing together the notions of batches and matrix multiplication. My impression is that his intuition concurs with mine, but he’s not mis-attributing the problem that he sees.

ekylberg · January 4, 2022, 10:17pm

Allow me to amend my earlier comment. I might be confused about something.

I thought that columns in the Weight matrix correspond to the node output values from the previous layer, not to different samples.

paulinpaloalto · January 4, 2022, 10:21pm

Yes, I think what you are confused about is that this has nothing to do with the weight matrices: the masks are being applied to the output of the layer after the activation function has been applied. It is A1 and A2 that are ANDed with D1 and D2, right? So you really are “zapping” (different) individual neuron outputs in individual samples. The columns of the activation matrices do represent the output neuron values for individual input samples, right?

The columns of a weight matrix are essentially meaningless. It is the rows of the weight matrices that represent the coefficients w.r.t. the inputs of that layer that give one particular neuron output value to the next layer.

ekylberg · January 4, 2022, 10:24pm

Thank you for helping me find my way through this. It totally makes sense now.

Topic		Replies	Views
Dimension of dropout matrix Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	490	August 14, 2022
W1 - Programming Assignment 2 - Improving Neural Network Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	620	February 21, 2022
Inverted dropout, killing nodes or stabbing training examples? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	1045	May 15, 2022
Week 1 ex2 foward propagation with drop out Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	612	October 31, 2021
Week 1 ex 2 forward_propagation_with_dropout error Improving Deep Neural Networks: Hyperparameter tun coursera-platform	36	1662	December 4, 2023

Confusion about dimension of Dropout matrix

Related topics