I’m writing regarding the video ‘Weight Initialization for Deep Networks’ part 3.34- 3.40
In this part of video, it’s said that: “it’s trying to set each of the weight matrices w, you know, so that it’s not too much bigger than 1 and not too much less than 1.”
But, the code: w = np.random.randn… is initializing a random variable that has mean equal 0. So as I understand w is not too much bigger than 0 and not too much less than 0. In the end, is 0 or 1 correct? If 0 were true then it would contradict what the instructor had previously stated.
I would be so much grateful for your comments.
Thanks a lot!
A mean equals to 0 means that the variable can fluctuate in the negative region and the positive region but should not fluctuate too large because smaller weights converge faster and help speed up computation!
For example one weight could be -0.5 and the other +0.5 and the mean is 0 and so on…
I have another question: If the variable fluctuates around 0, it will be able to lead to the case that gradients are too large and jump over the optimal point. Is this a serious problem?
I would be so much grateful for your comments.
Thanks a lot!
If it fluctuates around 0 and 1 then it would no be possible for the gradients to become too large, eg. 0.99*0.99<0.99. It might though jump over the optimum and thats why some optimization techniques reduce the learning rate as the number training epochs increase, to avoid this long jump! Still, even with with [-1, 1] range of weights it could over-jump the optimum, its never guaranteed to find the best optimum but you can stop the training once a good training accuracy is obtained, something close to the optimum!