Np.random.randn(5,10) * 0.01

Nuerals_NetWorth · January 31, 2024, 8:04am

Hi Everyone,

In the week3 last lecture named Random Initialization, it was mentioned that usually all the parameters of w in the layer is chosen as np.random.randn(5,10) * 0.01 (assume 5 nuerons and 10 parameters of w). It was taught that the reason why we choose a samller number like 0.01 to multiply is because, if we use large number like 100 to multiply, the slope of the sigmoid curve would be close to 0 and thus gradient descent will be slow.

But, even when we use smaller value like 0.01 or any value, wouldn’t the gradient descent slow down as the slope approaches 0 ? How can just using a value like 100 be problematic ?

Thanks in advance

TMosh · January 31, 2024, 8:32am

The slope of the sigmoid is only near zero at very large positive and negative values.

Around the origin, the slope is the maximum.

This helps the gradient descent process work more quickly.

Nuerals_NetWorth · February 1, 2024, 6:41am

How exactly the z value (wx + b) being near 0 or smaller value (sigmoid curve slope high) help gradient descent, which takes forward or a backward move by alpha * dj/dw, run faster ?

TMosh · February 1, 2024, 7:29am

The magnitude of the slope (i.e. the gradient) is the highest around the origin. This provides the highest amount of change in the weights during training.

Topic		Replies	Views
Np.random.rand() vs np.random.randn() Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	1604	February 24, 2022
Improving Deep Neural Networks - WK1 - Video: Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	7	152	June 17, 2024
Week 3 Programming Assignment Exercise 3 Error Neural Networks and Deep Learning coursera-platform	5	796	October 11, 2021
Can the random initialization of weights return very small values using np.random.randn((x,y))*0.001? Neural Networks and Deep Learning coursera-platform	3	692	September 28, 2021
Random initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	505	January 15, 2022

Np.random.randn(5,10) * 0.01

Related topics