Hi,

In course Andrew say that sqrt(2/n[l-1]) is He initialization and sqrt(1/n[l-1]) is Xavier initialization.

When I search more information about weight initialization I read that Xavier initialization is

sqrt(2/n[l-1] + n[l]) that Andrew says is Yoshua initialization .

What is the Xavier initialization and when should I use sqrt(1/n[l-1]) and sqrt(2/n[l-1] + n[l]) ?

As stated in the lesson, Xavier initialization is a method for initializing the weights of a neural network in order to ensure that the variance of the outputs of each layer is roughly the same as the variance of its inputs.

The formula for Xavier initialization is as follows:

**sqrt(1/n[l-1])**

for the case of sigmoid or tanh activation functions

**sqrt(2/n[l-1] + n[l])**

for the case of ReLU or its variants (like Leaky ReLU) activation functions