In the Understanding Dropout lecture, professor Ng mentions in an example that the 2nd layer of an NN has many weights (7x7 i.e. 7 input & 7 output), hence it needs a lower keep_prob (0.5) to regularize it more.
-
Am I correct in saying that this layer needs more regularization more because of its high number of input node (7), and not necessarily the number of output nodes (also 7)? This is because earlier on in the lecture, he explained that dropout spreads out the weights across the the input nodes (using an example of a node with multiple input, each with an associated weight). In other words, dropout seems to primarily affect the input weights of the layer.
-
Does this also mean that the layer 3 (3x7 weight matrix) also needs more regularization/lower keep_prob value (perhaps also 0.5, rather than the 0.7 shown in the lecture)? This is because this layer also has a high number of input nodes (7).
Thank you in advance for helping clarify my confusion!