# Can the random initialization of weights return very small values using np.random.randn((x,y))*0.001?

Please correct me if I am wrong but np.random.randn((x,y)) can generate very small values as mentioned here.

If we were to multiply this small value by 0.01 then it would lead to g(z) being small - a problem we are trying to avoid using random initialization. This problem would be more pronounced in NNs with fewer hidden units. Is that correct? If yes, how do we avoid this situation? If not, can someone please explain?

Iâ€™m not sure I understand the question. np.random.randn is the Normal (Gaussian) Distribution with \mu = 0 and \sigma = 1. That means that 99.7% of the values are in the range (-3, 3), although a few may be outside that range. If we then multiply by 0.01, that means that 99.7% of the values will be in the range (-0.03,0.03).

We need to randomly initialize the weights of our Neural Network before starting the training in order to achieve the required Symmetry Breaking. Hereâ€™s a thread which explains what that is and why it is required.

If you are going to break symmetry, there is some advantage in doing it with relatively small values. It is less likely that youâ€™ll have problems with divergence or saturation of the output values if you start small. But it turns out that initialization is not such a simple and straightforward thing as one might wish: there is no one magic recipe that works best in all cases. It is somewhat situation dependent, so the choice of initialization algorithm is yet another hyperparameter (a choice that needs to be made by the system designer). Prof Ng will talk more about this in Course 2 of this series, so please stay tuned for that.

1 Like