Doubt on weight initialization

JJaassoonn · August 19, 2023, 3:14pm

Dear Mentor,

Could you please guide me on this issue?

For weight initialization, we initialize W by using Gaussian distribution with mean=0 and variance 1/n^[1].

From the lecture, we have to use this code to initialize W,

W = random.randn(shape) * sqrt (1/n^[2])

which refers to

W = Gaussian random variables * standard deviation

May i know how to prove that this is valid?
Why don’t we use this equation?

W = Gaussian random variables * variance

Thank you.

l-1 ↩︎
l-1 ↩︎

gent.spah · August 19, 2023, 5:54pm

As far as I understand after reading the included link this is a way of calculating the variance.

Section 4 (Week 4).

paulinpaloalto · August 19, 2023, 6:14pm

And the larger point here is that this is not a question of “mathematical correctness”. It’s just a practical question of what works in a given situation. Notice that there is no single “one size fits all” initialization scheme: Prof Ng shows us 3 or 4 different ones and even that is not all the possibilities.

JJaassoonn · August 19, 2023, 7:12pm

According to the lecture video, we can set the variance of W to 1/n by multiplying both terms

W = random.randn(shape) * sqrt (1/n)

Gaussian random variables → random.randn(shape)
Standard deviation → sqrt (1/n)

May i know the mathematical theory behind it?

Thank you

paulinpaloalto · August 19, 2023, 7:26pm

The default Gaussian distribution returned by np.random.randn has \mu = 0 and \sigma = 1. So if you multiply that distribution by the constant \displaystyle \frac {1}{\sqrt{n}}, then we have:

\sigma = \displaystyle \frac {1}{\sqrt{n}}

and the variance is \sigma^2, of course.

Topic		Replies	Views
C2W1 Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	815	May 28, 2021
C2W1 Weight Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	550	September 2, 2022
Clarification for the variance 1/n Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	3	16	November 11, 2024
Initializing Weights to Mitigate Vanishing/Exploding Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	13	593	October 31, 2021
Weight Initialization for Deep Networks (Matrix W) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	550	January 13, 2022

Doubt on weight initialization

Related topics