Random Initalization in Neural Networks

paulinpaloalto · September 9, 2024, 5:51pm

There is a lot to say here. For starters, here’s a pre-existing thread that discusses Symmetry Breaking that’s worth a look. One interesting point covered on that thread is that it’s not necessary for Logistic Regression, but it is once we go to real Neural Networks with more than just the output layer.

It also turns out that you need to break symmetry at all layers of the neural net. If you look at the formulas for how the gradients are computed, it’s the inverse of how forward propagation works. In forward prop, the input neurons of any given layer each get all the outputs of the previous layer and then each neuron has its own weights that they apply to those inputs. When we are going backwards, then all the gradients from the subsequent layers apply equally to all the neurons in the current layer. So if those start out the same, then they stay the same. So any layer that you don’t start with asymmetric weights will just stay symmetric, which defeats the purpose of having multiple neurons in that layer.

Of course all this is an experimental science. If you still have doubts and don’t want to actually do the calculus, it’s easy enough just to try the experiment you are suggesting. E.g. define a 3 layer network and start the middle layer symmetric, but randomly initialize the other two layers. Then run the training and watch what happens: print out the weight matrix for the initially symmetric layer after the training and see what it looks like.

The other high level point here (which was also mentioned on that other thread above) is that it’s not just setting all the weights to zero that’s the problem: setting them to any fixed value is the problem. There’s nothing magically bad about zero: it’s symmetry that’s bad.

Topic		Replies	Views
Randomly initialize parameter b instead of W Neural Networks and Deep Learning coursera-platform	6	660	August 23, 2022
Week 3 Random Initialization Neural Networks and Deep Learning coursera-platform	6	675	May 6, 2022
How does Random Initialization prevent convergence? Neural Networks and Deep Learning coursera-platform	1	553	July 7, 2021
Questions about initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	674	October 30, 2021
What is the effect random initialization of W on multiple nodes and in neural network when all of them are doing the same thing Neural Networks and Deep Learning week-3 , coursera-platform	6	174	May 24, 2024

Random Initalization in Neural Networks

Related topics