Week 4 - initialize_parameters_deep - w initialisation redefined for Exercise 2

daniel.redgate · March 9, 2022, 12:52pm

I created the NN model my own, and when doing the exercise after, I noticed a difference in the output. This was due to the new definition being used for the initialize_parameters_deep function

instead of W being initialised with:

random numbers * 0.01
they are instead initialised with:
random numbers / the sqrt of the previous layer dimensions.
Is this a common practice / is there any clear/intuitive rationale behind this change?

full initialisation definition:
after change:
parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) / np.sqrt(layer_dims[l-1])

originally:
parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01

NB I extracted the function definition using:
print(inspect.getsource(initialize_parameters_deep))

paulinpaloalto · March 9, 2022, 3:53pm

Yes, that is a good observation. It turns out they are using a more sophisticated initialization algorithm called Xavier Initialization that we will learn about in Course 2 of this series. The reason they had to do that is that the simple version they had us build in the previous exercise just doesn’t work well at all with this model and dataset. Try it and watch what happens. The convergence is terrible. It turns out that choice of initialization algorithm in an important “hyperparameter”, meaning a choice you need to make. There is no one “silver bullet” solution that works the best in all cases. Prof Ng will explain this in much more detail in Course 2, so please stay tuned for that. There’s just too much other stuff to cover here in Course 1. As to why they did not mention this in the notebook, I’m not sure, but my guess is they didn’t want to reveal that they had given you correct solutions to all the functions in the Step by Step exercise. Just my theory … But perhaps the simpler reason is what I alluded to before: there’s just too much new material to cover in one course so they are saving it for later. Of course if you had used their sophisticated init code, it would have failed the grader in the Step by Step exercise.

daniel.redgate · March 9, 2022, 4:35pm

Thank you so much for the quick reply Paul! I had suspected it was something along those lines with this modified initialisation technique effectively acting as another hyperparameter, and it’s great to get a confirmation of this.

The course is really fantastic, so once I’ve finished my other courses I’ll be sure to come back here and finish off the other modules!

Having experimented a little, I can’t wait to start exploring all the methods to select hyperparameters, as the practice I did on other datasets really highlighted how difficult it is to select them well!

paulinpaloalto · March 9, 2022, 5:05pm

Yes, how to make and evaluate hyperparameter choices in a systematic fashion will be major focus of the first two weeks of Course 2 and essentially all of Course 3, although Course 3 also focusses a bit more on the data side of things.

trind1 · April 17, 2022, 8:26am

I’ve tried to build the NN on my own too and was scratching my head because of this. But I’ve found out about this in the dnn_app_utils_v3.py.

Uditgt · July 2, 2022, 10:34pm

True. With the original initialization, cost gets ‘stuck’ at around 0.64 from iterations 600 and beyond. I was amazed to see such a huge impact from seemingly such a small change. Butterfly effect! So glad to see this has been discussed already.

Topic		Replies	Views
Course 1 Week 4 Exercise 2 - initialize_parameters_deep Neural Networks and Deep Learning	9	665	October 16, 2021
Week 4 Assignment 1 Exercise 3.1 Initialize_parameters Neural Networks and Deep Learning	5	616	December 1, 2021
Week 4 Assignment Neural Networks and Deep Learning	1	558	November 12, 2021
W4-Assignment 1 , Exercise 1 Neural Networks and Deep Learning week-4	2	20	January 25, 2025
Course 2 week 1 random initialization Improving Deep Neural Networks: Hyperparameter tun	3	634	July 30, 2022

Week 4 - initialize_parameters_deep - w initialisation redefined for Exercise 2

Related topics