Weight initialization Course 2 week 1

Pavel_Grobov · January 26, 2023, 8:51pm

Hi,
In course Andrew say that sqrt(2/n[l-1]) is He initialization and sqrt(1/n[l-1]) is Xavier initialization.
When I search more information about weight initialization I read that Xavier initialization is
sqrt(2/n[l-1] + n[l]) that Andrew says is Yoshua initialization .

What is the Xavier initialization and when should I use sqrt(1/n[l-1]) and sqrt(2/n[l-1] + n[l]) ?

carlosrl · January 27, 2023, 5:40am

As stated in the lesson, Xavier initialization is a method for initializing the weights of a neural network in order to ensure that the variance of the outputs of each layer is roughly the same as the variance of its inputs.
The formula for Xavier initialization is as follows:

sqrt(1/n[l-1])
for the case of sigmoid or tanh activation functions

sqrt(2/n[l-1] + n[l])
for the case of ReLU or its variants (like Leaky ReLU) activation functions

Topic		Replies	Views
Week 1, W initialization to large random number, and HE Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	528	August 31, 2021
Weight Initialization for Deep Networks : week 1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	561	June 20, 2021
C2W1 Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	819	May 28, 2021
Xavier Initilization formula Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	557	June 22, 2021
Questions about initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	694	October 30, 2021

Weight initialization Course 2 week 1

Related topics