W1 - Programming Assignment 2 - Improving Neural Network

Hi,

Why does Dropout matrix needs to be of same size as Activation. Since Dropout removes only neurons it should be in shape of (Number of Neurons, 1)

1 Like

Hi,
Not quite sure understand what you mean. Activation is not a matrix, it’s just a function that can be applied to any vector/matrix. So, the thing is, it’s just a notation that has been used. And you are right, that it should be like (num_neurons,1) but the idea is, this A matrix is given to us after applying activation function and hence the name.

This is an interesting point. Prof Ng is not that specific about this in the lectures, but you can clearly see in the notebook that they define the “drop” mask in such a way that not all samples in the given batch are handled the same way w.r.t. dropout. That is the implication of the fact that the dropout mask is the same shape as the output activation at the given layer. Here’s a thread which discusses this in more detail.

1 Like

Thanks @paulinpaloalto , This makes sense. Since dropout matrix would not be same for every training example. It only makes sense to randomnly create it for every training example differently to reduce overfitting.

Yes, I think that’s the intuition, but it’s worth reading that other thread that I linked. A fellow student did some fairly detailed experimentation comparing the “different for every sample” method with the “consistent for every sample (in a given iteration of course)” method and there doesn’t really seem to be that much difference. But all this behavior is fundamentally statistical, so maybe with bigger experiments we’d be better able to see if there are really differences in the results …