Dear Mentor,
Could you please guide me on this issue?
For weight initialization, we initialize W by using Gaussian distribution with mean=0 and variance 1/n.
From the lecture, we have to use this code to initialize W,
W = random.randn(shape) * sqrt (1/n)
which refers to
W = Gaussian random variables * standard deviation
May i know how to prove that this is valid?
Why don’t we use this equation?
W = Gaussian random variables * variance
Thank you.
As far as I understand after reading the included link this is a way of calculating the variance.
Section 4 (Week 4).
And the larger point here is that this is not a question of “mathematical correctness”. It’s just a practical question of what works in a given situation. Notice that there is no single “one size fits all” initialization scheme: Prof Ng shows us 3 or 4 different ones and even that is not all the possibilities.
1 Like
According to the lecture video, we can set the variance of W to 1/n by multiplying both terms
W = random.randn(shape) * sqrt (1/n)
- Gaussian random variables → random.randn(shape)
- Standard deviation → sqrt (1/n)
May i know the mathematical theory behind it?
Thank you
The default Gaussian distribution returned by np.random.randn
has \mu = 0 and \sigma = 1. So if you multiply that distribution by the constant \displaystyle \frac {1}{\sqrt{n}}, then we have:
\sigma = \displaystyle \frac {1}{\sqrt{n}}
and the variance is \sigma^2, of course.