How does Random Initialization prevent convergence?

seantolino · July 5, 2021, 6:21am

Hello,

I’m wondering how randomly initializing the weights in a neural net prevents the different nodes in a single layer from eventually converging to the same value. Since each of the nodes within a single layer are all taking the exact same input values, would they not all have the same optimum? Even if they don’t reach the optimum simultaneously, would they not all eventually converge to the same optimum?

Thanks!

kenb · July 7, 2021, 6:28pm

Hi @seantolino and welcome to the DL Specialization.

You may have accidentally gotten your subject line wrong. Random Initialization is a (necessary) component of convergence to a minimum cost (hopefully a global minimum at that).

The initial parameters need to “break symmetry” between different units. That is, if two hidden units with identical activations are connected to the same inputs, then these units must have different initial parameters. If they do not, the gradient descent algorithm will always update both of the units in the same way. In a sense, the units would be redundant. The algorithm needs tp explore the parameter space for learning to occur.

It’s not the most pleasant of chores, but one can convince oneself of this fact with paper and pencil on a shallow neural network with a single hidden layer.

Topic		Replies	Views
Random Initalization in Neural Networks Neural Networks and Deep Learning week-3	15	53	September 11, 2024
Initializing parameters in feedforward neural network AI Discussions ai-discussions	5	19	September 7, 2024
Randomly initialize parameter b instead of W Neural Networks and Deep Learning	6	656	August 23, 2022
Why random weights help in faster convergence? AI Discussions	8	57	January 18, 2023
Week 3 Random Initialization Neural Networks and Deep Learning	6	670	May 6, 2022

How does Random Initialization prevent convergence?

Related topics