Why do we multiply the random intial weights by 0.01?

Riccardo_Andreoni · September 2, 2022, 12:33pm

Hello everybody,
I see that in the assignment where we build a NN from scratch, we randomly initialize the weights and then we multiply them by 0,01.
I understand the randomization: it’s to break the symmetry.
I don’t understand why we multiply them by a small number (0.01). Is it to make the weight converge faster? In this case why?

Thank you,
Riccardo

jenitta.jebaraj · September 2, 2022, 12:59pm

Hello @Riccardo_Andreoni
we are multiplying the initial weights by 0.01, so that Z initially places near a good value of g(Z).
For more details kindly check Symmetry Breaking versus Zero Initialization
regards
Jenitta

paulinpaloalto · September 2, 2022, 2:43pm

It turns out that there are a number of different algorithms for random initialization and there is no one “silver bullet” version that works best in all cases. If you actually look at the provided utility routines in the C1 Week 4 Assignment 2, you’ll see that they actually needed to use a more sophisticated algorithm called Xavier Initialization that we will learn about in Course 2 of this series. You should go back and try using the “multiply by 0.01” method from the Step by Step assignment in the L Layer case and watch how much worse the convergence is.

The general answer to the question is that they’ve tried lots of different alternatives and it turns out that smaller values generally work better. Note that one issue with larger values is that may cause problems with large absolute values of the Z linear output, which can end up “saturating” the sigmoid function. Even with 64 bit floating point, it’s pretty easy to get a z value that causes sigmoid to round to exactly 1, which makes the cost function return NaN. All it takes to hit that is z > 36.

Topic		Replies	Views
Week2 Programming Assignment 1 - random weight initialization Improving Deep Neural Networks: Hyperparameter tun	3	514	October 20, 2022
Week 1, W initialization to large random number, and HE Improving Deep Neural Networks: Hyperparameter tun	2	524	August 31, 2021
Intuition on weight initialization Improving Deep Neural Networks: Hyperparameter tun	1	526	November 1, 2022
Can the random initialization of weights return very small values using np.random.randn((x,y))*0.001? Neural Networks and Deep Learning	3	692	September 28, 2021
Improving Deep Neural Networks - WK1 - Video: Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun week-1	7	152	June 17, 2024

Why do we multiply the random intial weights by 0.01?

Related topics