Dear Deeplearning.ai team,

I’m writing regarding the video **‘Weight Initialization for Deep Networks’** part 2.22 - 2.58 (variance’s correction in case of using ReLU as an activation function).

In this part of video, it’s said that

“It turns out that if you’re using a ReLu activation function that, rather than 1 over n it turns out that, set in the variance of 2 over n works a little bit better.”

Let’s consider x - a random variable from Normal Distribution with parameters: mean = 0, variance = 1.

Then the random variable y = max{0, x} is going to have mean = 1 / sqrt(2*pi), variance=1/2. Links with the computations: mean, variance

If we change the distribution of x into Normal: mean=0, variance=1/n

Then y = max{0, x} will have mean = 1 / ( sqrt(2*pi) * sqrt(n)) , variance=1 / (2*n). Links with the computations: mean , variance

**So, it seems like we get not 2/n as in the video, but 1/(2*n), doesn’t it?**

I would be so much grateful for your comments.

Thanks a lot!