Hi,
In week 1’s initialisation programming exercise we are shown that setting random weights too large leads to poor performance, when set with a scale factor of 10. We are then shown the He implementation which performs much better.
However, if you remove the scale factor of 10 and set the weights randomly from normal distribution the performance on the dataset is better (marginally) than the He initialisation.
Random cost:
He cost:
Is He preferred as generally it performs better than random initialisation, or is the takeaway here that we should try both and see what works best on our data?
Thanks