Weight Initialization

hyder · August 12, 2021, 1:01am

Whether we decide to initialize Ws using just normal distribution or using a scaler as in Xavier distribution afterwards, does it make any sense/difference to try to use normal distribution for every column of W independently?

paulinpaloalto · April 21, 2022, 6:30pm

I’m not sure I understand what you mean by normal distribution per column independently. We’re talking about a random normal distribution, right? It’s either random or it’s not. If it’s really random, then what is the difference if you call it once for all columns or n times for each column?

But maybe I’m missing your point and what you really meant was using different algorithms for different columns. I’ve never heard anyone discuss that, but maybe there is something interesting to be learned there. You could try some experiments and see if you see any interesting results when you try that. This is an experimental science: give it a try and see what you learn! Please share your results!

hyder · April 21, 2022, 11:39pm

You are right, initializing each column independently doesn’t make it more like normal distribution! maybe I didn’t have clear understanding of how random algorithms work at that time.

Topic		Replies	Views
Weight Initialization for Deep Networks (Matrix W) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	556	January 13, 2022
Week 4 - initialize_parameters_deep - w initialisation redefined for Exercise 2 Neural Networks and Deep Learning coursera-platform	5	658	July 2, 2022
Initializing Weights to Mitigate Vanishing/Exploding Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	13	613	October 31, 2021
Not able to initialize parameters Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	577	August 31, 2021
Initialization of weights for a neural net AI Discussions	8	174	December 11, 2021

Weight Initialization

Related topics