Clarification for the variance 1/n

hey everyone, regarding this video

I wondered if there is any resource to prove or show some intuition as to why w[l] = np.rand.radn(shape) * np.sqrt(2/n[l-1]) will bring the variance to 1/n as mentioned in the video, the explanation seemed a bit hand-wavy and I could not figure out the math.

Any references would be appreciated

Please give a time mark within the video where your question applies.

Note that it’s not np.random.rand, it’s np.random.randn. That is a normal (Gaussian) distribution with \mu = 0 and \sigma = 1. Then you multiply the generated matrix by \sqrt{\frac {2}{n^{[l-1]}}}. Remember that the definition of variance is the square of the standard deviation or \sigma^2, so the variance will be \frac {2}{n^{[l-1]}} which is O(\frac {1}{n^{[l-1]}}).

Prof Ng mentions in the video that using \frac {2}{n^{[l-1]}} works better if you are using ReLU activations, but \frac {1}{n^{[l-1]}} generally is better with other activations like tanh.

thank you @paulinpaloalto this makes sense