Why random weights help in faster convergence?

tbhaxor · January 18, 2023, 9:13am

Long time ago I read this somewhere that it is preferred to use random initialisers for the weights as it help in faster convergence than zero or one initializers.

Why do you think this is the case? I couldnt recall the link, maybe it was machineleanringmastery or someother blog.

Samuel_Chazy · January 18, 2023, 10:22am

Hi @tbhaxor,

Initializing weights with zero or ones mean that you are starting with small weight numbers. This will take the algorithm more time to converge and with even a possibility of not converging or reaching the global minima (vs local minima). Random initializers, will oscillate between low and high numbers, giving the algorithm better chances of converging and reaching the global minima.

tbhaxor · January 18, 2023, 10:24am

Can you show this mathematically?

Isaak_Kamau · January 18, 2023, 2:15pm

Nice question @tbhaxor

Here is my point of view & I STAND TO BE CORRECTED IF WRONG

I do think if we start with zeros and ones we will affect our ERROR FUNCTION which is very important in neural networks learning

Here is a high level summary of training a neural network

Doing a feedforward(a process that neural networks use to turn the input into an output) operation.
Comparing the output of the model with the desired output.
Calculating the error.
Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
Use this to update the weights, and get a better model.
Continue this until we have a model that is good.

As you can see in step 3, Training a neural network is mostly about trying to minimize error fuctions (errors function is inversely proportion to probability) so we ‘throw in’ some number to the model see how it performs then compare it’s output with the desired output ( desired output - model output = error function) so i think if we start with zeros and ones we will hurt the training process.

There’s some nice mathematics behind it but it will take hours if I decide to do them right here try to research some books with Calculus for Machine Learning

Anyone with a different idea is so much welcomed!

tbhaxor · January 18, 2023, 2:51pm

Backward because of chain rule of derivation, right?

tbhaxor · January 18, 2023, 2:55pm

Also I have seen that in real world data, I have seen the using random weights it is easy to minimize loss in first 5 7 of epochs, but with constant like 0 or 1 it takes more than 5 7 epochs to what i say “actually start converging”. havent gone into the maths right now but this is my experience.

From real world data, I mean that any data which contain some kind of error (noise)

Juan_Olano · January 18, 2023, 2:55pm

I’d like to add:

If you initialize the W weights in zero, what you’ll find is that all the neurons will produce the same outputs and the NN will not learn.

This is called the Symmetry Breaking. To break this symmetry you would initialize the weights with random numbers.

The mathematical proof becomes very simple: Each neuron receives all previous inputs in a fully connected layer, and you apply the linear function with the same value to all neurons, the result will be exactly the same.

tbhaxor · January 18, 2023, 2:57pm

This is the answer I was actually looking. I know I had heard this term before but couldnt recall it. Thank you @Juan_Olano

Juan_Olano · January 18, 2023, 2:57pm

I’m glad we could help.

Topic		Replies	Views
Random Initalization in Neural Networks Neural Networks and Deep Learning week-3	15	53	September 11, 2024
Clarification on Zero Initialization in Neural Network Linear Regression Neural Networks and Deep Learning	3	1083	November 16, 2023
Why do we multiply the random intial weights by 0.01? Neural Networks and Deep Learning	2	653	September 2, 2022
How does Random Initialization prevent convergence? Neural Networks and Deep Learning	1	552	July 7, 2021
Randomly initialize parameter b instead of W Neural Networks and Deep Learning	6	656	August 23, 2022

Why random weights help in faster convergence?

Related topics