Parameter initialization question

Lily007 · February 17, 2023, 8:49am

Hi, I understand that for W initialization when we multiply by np.sqrt(1/n^{[l-1]}) could prevent exploding gradient since it could make W smaller. But I am confused that how would this prevent vanishing gradient. Since if wi is smaller than 1, multiply it by np.sqrt(1/n^{[l-1]}) could make it even smaller, wouldn’t this make gradient vanishing worse?

Christian_Simonis · February 18, 2023, 7:24am

Hi @Lily007

welcome to the community and thanks for your first question!

The purpose of weight matrix matrix initialisation is rather to keep the initialized weights in a first reasonable range. So it’s especially this comparable range that we want to achieve so that we are well prepared to go ahead with our optimization to tune the weights to solve the optimization problem.

Specifically:

The goal of Xavier Initialization is to initialize the weights such that the variance of the activations are the same across every layer. This constant variance helps prevent the gradient from exploding or vanishing.

Topic		Replies	Views
Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	634	July 1, 2021
Improving Deep Neural Networks - WK1 - Video: Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	7	152	June 17, 2024
Week 1, W initialization to large random number, and HE Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	524	August 31, 2021
Weight Initialization for Deep Networks (Matrix W) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	550	January 13, 2022
Question on weight initialization and exploding/vanishing gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	673	May 23, 2021

Parameter initialization question

Related topics