# C2W1 Weight Initialization

In the video we learn about initializing weights with regard to small variance, so the weights won’t vary by much from 1. This is done by multiplying the random elements of w by sqrt(2/n), thus also reducing the mean. I understand the point with the variance, but I don’t understand why the result is more “centered” around 1. np.random.rand() outputs a uniform distribution between 0 and 1 so the mean is 0.5. After the multiplication wouldn’t the elements be even smaller and of course not around 1? And if we take Gaussian distribution for instance, the mean is 0 and so the “centering” would be around 0.

Unless I’m missing something here, I would expect that the argument above would work only if the mean of the random sample is 1 and to my understanding this is not the case.

Can you give us a reference to where the statement is made that the intent is to center the data around 1? I agree with your statement that this doesn’t seem to make sense. If it’s in the lectures, please give us the name of the lecture and the time offset.

Around minute 3:30 in the video I was talking about: Weight Initialization for Deep Networks.
https://www.coursera.org/learn/deep-neural-network/lecture/RwqYe/weight-initialization-for-deep-networks.

I would add that the phrase “centering” is my interpretation of what being said, unless I understood incorrectly.

Ok, I listened to that section and I think the confusion stems from the fact that when he talks about something staying reasonably close to 1, he is not talking about the weight values: he means the result of the linear combination which is z, right? Listen again and notice that he says that he’s basing these estimates on the assumption that the input values x_i have \mu = 0 and \sigma = 1. So the goal is to keep the absolute value of z close to 1 by limiting the weight values by multiplying by that scale factor with the number of terms in the sum in the denominator.

Note that Prof Ng always uses a Gaussian distribution (np.random.randn, not np.random.rand) for the random initialization values with \mu = 0, so multiplying by a factor changes only the variance, not the mean.