Dear Mentor,

Could you please guide me on this issue?

For weight initialization, we initialize W by using Gaussian distribution with mean=0 and variance 1/n.

From the lecture, we have to use this code to initialize W,

W = random.randn(shape) * sqrt (1/n)

which refers to

W = Gaussian random variables * standard deviation

May i know how to prove that this is valid?

Why don’t we use this equation?

W = Gaussian random variables * variance

Thank you.

As far as I understand after reading the included link this is a way of calculating the variance.

Section 4 (Week 4).

And the larger point here is that this is not a question of “mathematical correctness”. It’s just a practical question of what works in a given situation. Notice that there is no single “one size fits all” initialization scheme: Prof Ng shows us 3 or 4 different ones and even that is not all the possibilities.

1 Like

According to the lecture video, we can set the variance of W to 1/n by multiplying both terms

W = random.randn(shape) * sqrt (1/n)

- Gaussian random variables →
**random.randn(shape)**
- Standard deviation →
**sqrt (1/n)**

May i know the mathematical theory behind it?

Thank you

The default Gaussian distribution returned by `np.random.randn`

has \mu = 0 and \sigma = 1. So if you multiply that distribution by the constant \displaystyle \frac {1}{\sqrt{n}}, then we have:

\sigma = \displaystyle \frac {1}{\sqrt{n}}

and the variance is \sigma^2, of course.