Weight Initialisation - random can be better than He?

AJW · September 30, 2021, 5:55am

Hi,

In week 1’s initialisation programming exercise we are shown that setting random weights too large leads to poor performance, when set with a scale factor of 10. We are then shown the He implementation which performs much better.

However, if you remove the scale factor of 10 and set the weights randomly from normal distribution the performance on the dataset is better (marginally) than the He initialisation.

Random cost:

He cost:

Is He preferred as generally it performs better than random initialisation, or is the takeaway here that we should try both and see what works best on our data?

Thanks

jonaslalin · September 30, 2021, 8:20am

It depends on the activation functions being used. If you want to read about the theoretical justification for why Xavier initialization is best for tanh, the DeepLearning.ai team has written a great article on the topic:

He initialization for ReLU activation functions follows the same thought process.

Topic		Replies	Views
Week 1, W initialization to large random number, and HE Improving Deep Neural Networks: Hyperparameter tun	2	524	August 31, 2021
Week2 Programming Assignment 1 - random weight initialization Improving Deep Neural Networks: Hyperparameter tun	3	514	October 20, 2022
Week 1 increasing number of iterations for big randomly initialize value of W does not give better results Improving Deep Neural Networks: Hyperparameter tun	3	516	August 21, 2022
He initialization Improving Deep Neural Networks: Hyperparameter tun	1	457	May 29, 2023
Week 1 - Programming Assignment 1 Improving Deep Neural Networks: Hyperparameter tun	1	601	April 10, 2022

Weight Initialisation - random can be better than He?

Related topics