Hi Everyone,

When learning about non-random weight initialization to prevent vanishing/exploding gradients, we see that W[l] = np.random.randn([shape of layer L]) * np.sqrt(1/n[l-1]).

Does this initialization work with dropout regularization, seeing as n[l-1], the number of nodes in layer L-1 connected to each neuron in layer L, changes each time?