Why do we select randn instead of rand and why do we multiply by 0.01?

here2infinity · January 15, 2024, 6:11pm

Week 3 assignment

Why do we have to multiply by 0.01 and not just take those random values generated? Why do we also sample with randn and not just rand?

TMosh · January 15, 2024, 6:15pm

Small random values from a normal distribution are a good choice for the initial weight values.

paulinpaloalto · January 15, 2024, 7:19pm

Weight values can be either positive or negative, right? If you use rand, you get only positive values. A normal distribution is a better model in general for “real world” statistical phenomena. If you start with all positive weights, but you need to learn some values that are negative, maybe it takes longer? Just an intuition, not a mathematical proof, of course. But the higher level point is that everything is experimental here and there is no one universal right answer that works best in all cases.

So if you have a particular case, try both randn and rand with 0.01 scaling and see if you notice any difference in convergence and the accuracy of the resulting model. But as I mentioned, there is no universal answer, so even if you do find a case where rand happens to work better, I think what Prof Ng is saying is that you have a better chance of success if you start with randn. But you have to run the experiment to know for sure in a given case …

Topic		Replies	Views
Week 3 Programming Assignment Exercise 3 Error Neural Networks and Deep Learning coursera-platform	5	809	October 11, 2021
Can the random initialization of weights return very small values using np.random.randn((x,y))*0.001? Neural Networks and Deep Learning coursera-platform	3	711	September 28, 2021
Why do we multiply the random intial weights by 0.01? Neural Networks and Deep Learning coursera-platform	2	721	September 2, 2022
Np.random.rand() vs np.random.randn() Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1866	February 24, 2022
Confusion about `Exercise 2 - initialize_parameters_random`: why small initiations cost bad performance Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	2	27	March 17, 2025

Why do we select randn instead of rand and why do we multiply by 0.01?

Related topics