Weight initialisation

computerandgyein · July 1, 2024, 6:11am

In Week 1 – Video (Weight Initialization for Deep Networks),

Hello,

From this given image,

Is there anyone can explain to me why when n gets larger, w_i will get smaller?
Also, the reason of taking (or assuming) the variance of w to be 1/n?

TMosh · July 1, 2024, 6:37am

Since the sigmoid(z) function has a limited output range (from 0.0 to 1.0), the gradients will approach zero when the z values are large. So when you have lots of inputs to compute w*x and sum to get z, the weights must be learned to be small, to prevent the output getting into the range where the gradients are tiny.

computerandgyein · July 1, 2024, 7:29am

Thank you @TMosh,

Ohhhh you’re right, that makes sense.

tarunsaxena1000 · July 1, 2024, 3:46pm

Consider a single neuron with n input connections. The output of this neuron is typically a weighted sum of the inputs plus a bias term.
If the inputs have zero mean and unit variance, and if the weights are initialized with zero mean and variance σ^2, the variance of the weighted sum (output of the neuron before applying the activation function) is nσ^2.
To maintain a unit variance for the output, we set n x σ^2=1. Thus, σ^2=1/n.

This helps in preventing the gradients from either vanishing or exploding as they propagate backward through the network.

This is just my intuition as I am also in the learning phase, for details we need to read the research paper.

Topic		Replies	Views
Improving Deep Neural Networks - WK1 - Video: Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun week-1	7	152	June 17, 2024
Weight Initialization for Deep Networks: why aim for Var(W_i) = 1/n Improving Deep Neural Networks: Hyperparameter tun	1	546	February 9, 2022
C2W1 Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun	3	815	May 28, 2021
Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun	6	634	July 1, 2021
Why is Var(Wi) = 1/n? Improving Deep Neural Networks: Hyperparameter tun week-1	1	51	July 1, 2024

Weight initialisation

Related topics