C2W1 Weight Initialization for Deep Networks

g.dychko · May 26, 2021, 10:29am

Dear Deeplearning.ai team,

I’m writing regarding the video ‘Weight Initialization for Deep Networks’ part 2.22 - 2.58 (variance’s correction in case of using ReLU as an activation function).

In this part of video, it’s said that

“It turns out that if you’re using a ReLu activation function that, rather than 1 over n it turns out that, set in the variance of 2 over n works a little bit better.”

Let’s consider x - a random variable from Normal Distribution with parameters: mean = 0, variance = 1.

Then the random variable y = max{0, x} is going to have mean = 1 / sqrt(2*pi), variance=1/2. Links with the computations: mean, variance

If we change the distribution of x into Normal: mean=0, variance=1/n

Then y = max{0, x} will have mean = 1 / ( sqrt(2pi) * sqrt(n)) , variance=1 / (2n). Links with the computations: mean , variance

So, it seems like we get not 2/n as in the video, but 1/(2*n), doesn’t it?

I would be so much grateful for your comments.
Thanks a lot!

nramon · May 26, 2021, 6:49pm

Hi, @g.dychko.

Great question. Feel free to correct me if I got it wrong.

You are not far off. In the lecture, 2/n is the variance of w[l], not y[l]. From the He initialization paper, the expression for the latter is (you can see the derivation on page 4). For the whole network you have var . To prevent the product from becoming exponentially large or small they set . Solving for Var(w[l]) you end up with 2/n.

Hope that made sense

g.dychko · May 27, 2021, 8:20pm

Dear @nramon,

Your explanation sounds absolutely convincing.
Thank you enormously for the link!

With best regards,
Galia

nramon · May 28, 2021, 12:34pm

Thank you, I’m really glad I could help.

Keep up the great work

Topic		Replies	Views
Clarification for the variance 1/n Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	3	16	November 11, 2024
Doubt on weight initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	392	August 19, 2023
C2W1 Weight Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	550	September 2, 2022
Why is Var(Wi) = 1/n? Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	1	51	July 1, 2024
Weight Initialization for Deep Networks: why aim for Var(W_i) = 1/n Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	546	February 9, 2022

C2W1 Weight Initialization for Deep Networks

Related topics