Clarification for the variance 1/n

ArminJ · November 10, 2024, 3:40pm

hey everyone, regarding this video

I wondered if there is any resource to prove or show some intuition as to why w[l] = np.rand.radn(shape) * np.sqrt(2/n[l-1]) will bring the variance to 1/n as mentioned in the video, the explanation seemed a bit hand-wavy and I could not figure out the math.

Any references would be appreciated

TMosh · November 10, 2024, 9:59pm

Please give a time mark within the video where your question applies.

paulinpaloalto · November 11, 2024, 12:36am

Note that it’s not np.random.rand, it’s np.random.randn. That is a normal (Gaussian) distribution with \mu = 0 and \sigma = 1. Then you multiply the generated matrix by \sqrt{\frac {2}{n^{[l-1]}}}. Remember that the definition of variance is the square of the standard deviation or \sigma^2, so the variance will be \frac {2}{n^{[l-1]}} which is O(\frac {1}{n^{[l-1]}}).

Prof Ng mentions in the video that using \frac {2}{n^{[l-1]}} works better if you are using ReLU activations, but \frac {1}{n^{[l-1]}} generally is better with other activations like tanh.

ArminJ · November 11, 2024, 8:00am

thank you @paulinpaloalto this makes sense

Topic		Replies	Views
C2W1 Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	815	May 28, 2021
Doubt on weight initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	392	August 19, 2023
C2W1 Weight Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	550	September 2, 2022
Why is Var(Wi) = 1/n? Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	1	51	July 1, 2024
Weight Initialization for Deep Networks: why aim for Var(W_i) = 1/n Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	546	February 9, 2022

Clarification for the variance 1/n

Related topics