Doubt on weight initialization

Dear Mentor,

Could you please guide me on this issue?

For weight initialization, we initialize W by using Gaussian distribution with mean=0 and variance 1/n[1].

From the lecture, we have to use this code to initialize W,

W = random.randn(shape) * sqrt (1/n[2])

which refers to

W = Gaussian random variables * standard deviation

May i know how to prove that this is valid?
Why don’t we use this equation?

W = Gaussian random variables * variance

Thank you.


  1. l-1 ↩︎

  2. l-1 ↩︎

As far as I understand after reading the included link this is a way of calculating the variance.

Section 4 (Week 4).

And the larger point here is that this is not a question of “mathematical correctness”. It’s just a practical question of what works in a given situation. Notice that there is no single “one size fits all” initialization scheme: Prof Ng shows us 3 or 4 different ones and even that is not all the possibilities.

1 Like

According to the lecture video, we can set the variance of W to 1/n by multiplying both terms

W = random.randn(shape) * sqrt (1/n)

  1. Gaussian random variables → random.randn(shape)
  2. Standard deviation → sqrt (1/n)

May i know the mathematical theory behind it?

Thank you

The default Gaussian distribution returned by np.random.randn has \mu = 0 and \sigma = 1. So if you multiply that distribution by the constant \displaystyle \frac {1}{\sqrt{n}}, then we have:

\sigma = \displaystyle \frac {1}{\sqrt{n}}

and the variance is \sigma^2, of course.