C2W1 - Weight Initialization for Deep Networks: A problem with mean of w

RcyuH · August 5, 2024, 8:25am

Dear Deeplearning.ai team,

I’m writing regarding the video ‘Weight Initialization for Deep Networks’ part 3.34- 3.40

In this part of video, it’s said that: “it’s trying to set each of the weight matrices w, you know, so that it’s not too much bigger than 1 and not too much less than 1.”

But, the code: w = np.random.randn… is initializing a random variable that has mean equal 0. So as I understand w is not too much bigger than 0 and not too much less than 0. In the end, is 0 or 1 correct? If 0 were true then it would contradict what the instructor had previously stated.

I would be so much grateful for your comments.
Thanks a lot!

gent.spah · August 5, 2024, 8:56am

A mean equals to 0 means that the variable can fluctuate in the negative region and the positive region but should not fluctuate too large because smaller weights converge faster and help speed up computation!

For example one weight could be -0.5 and the other +0.5 and the mean is 0 and so on…

RcyuH · August 6, 2024, 10:03am

Thanks for your help, mentor!

I have another question: If the variable fluctuates around 0, it will be able to lead to the case that gradients are too large and jump over the optimal point. Is this a serious problem?

I would be so much grateful for your comments.
Thanks a lot!

gent.spah · August 6, 2024, 10:17am

If it fluctuates around 0 and 1 then it would no be possible for the gradients to become too large, eg. 0.99*0.99<0.99. It might though jump over the optimum and thats why some optimization techniques reduce the learning rate as the number training epochs increase, to avoid this long jump! Still, even with with [-1, 1] range of weights it could over-jump the optimum, its never guaranteed to find the best optimum but you can stop the training once a good training accuracy is obtained, something close to the optimum!

TMosh · August 7, 2024, 3:42am

It is not a serious problem.

RcyuH · August 8, 2024, 7:00am

I really understood the lesson a lot better. Thank you two mentors very much!

Topic		Replies	Views
Weight Initialization for Deep Networks (Matrix W) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	560	January 13, 2022
C2W1 Weight Initialization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	564	September 2, 2022
Weight initialisation Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	3	68	July 1, 2024
Why is Var(Wi) = 1/n? Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	1	57	July 1, 2024
Improving Deep Neural Networks - WK1 - Video: Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	7	180	June 17, 2024

C2W1 - Weight Initialization for Deep Networks: A problem with mean of w

Related topics