Does dropout intensity depend on input or output of layer?

In the Understanding Dropout lecture, professor Ng mentions in an example that the 2nd layer of an NN has many weights (7x7 i.e. 7 input & 7 output), hence it needs a lower keep_prob (0.5) to regularize it more.

  • Am I correct in saying that this layer needs more regularization more because of its high number of input node (7), and not necessarily the number of output nodes (also 7)? This is because earlier on in the lecture, he explained that dropout spreads out the weights across the the input nodes (using an example of a node with multiple input, each with an associated weight). In other words, dropout seems to primarily affect the input weights of the layer.

  • Does this also mean that the layer 3 (3x7 weight matrix) also needs more regularization/lower keep_prob value (perhaps also 0.5, rather than the 0.7 shown in the lecture)? This is because this layer also has a high number of input nodes (7).

Thank you in advance for helping clarify my confusion!

Hi, @chemgeostats !

You’re right about increasing the ratio of dropped weights when you need more regularization. That is the basic concept. Nonetheless, and you will see it when you advance through the course, a ~50 weights layers is far from having many weights. You can check efficientnet or resnet with layers that have millions of weights.

Therefore, adjusting the dropout ratio is more of an art sometimes when you need to optimize it for your particular case, although you can see some basic patterns and logic underneath.

