Adding layers trigger Biasedness

paulinpaloalto · November 22, 2023, 8:58pm

One other thought here is that initialization algorithms matter more than you might intuitively expect. If you are just using the simple version of initialize_parameters_deep that they had us build in DLS C1 W4 A1, then you should also try a more sophisticated init function for your case with more layers. Take a look at the actual algorithm they used in DLS C1 W4 A2 for initialize_parameters_deep. It’s a version of the more sophisticated “He” initialization that Prof Ng shows us in DLS C2 W1. From the DLS C1 W4 A2 notebook, just click “File → Open” and have a look at the utility functions python file.

And just as an illuminating experiment, try training the 4 layer network in C1 W4 A2 with the simple init function from W4 A1 and see how bad the convergence is compared to the “He” initialization. As I said above, it’s surprising and a bit counterintuitive that it makes that much of a difference.

TMosh · November 23, 2023, 1:06am

I was thinking the same thing when I read through the deep initialization function.

rmwkwok · November 23, 2023, 11:08pm

Yes, that should be helpful. Besides zero gradients, also look for any strange patterns - such as same gradient values across samples in the same iteration, or the same gradient values across iterations. They are all useful pointers, and should be clearly seen, if any, once you print the gradient values out.

Victoria_Schroeder · November 27, 2023, 1:18pm

Hi ,

Thanks again for your awesome feedback.

I implemented the gradient checking and everything went very well.

However, the very cool thing was to experience the impact of the parameter initialization on model performance. As you already suggested, this made the difference.

First, not initializing b[l] to zero enabled the model to learn. Also if the number of layers increased. However, it learned later as documented with this cost graph below .

Using the He initialization for W[l] then really did another big impact. It enabled the model , with more layers, to learn faster and the cost also went down.

This means, that everything is clear for now. Thank you very much for your support. I hope this thread also helps other people with similar problems.

paulinpaloalto · November 27, 2023, 3:33pm

That’s great news that the He Initialization worked so much better. As I mentioned earlier, it’s counterintuitive and almost shocking that such a seemingly small change in how you initialize would have such a big impact. We depend on the work of the researchers like Prof Ng and his colleagues who figured all this stuff out and thanks to Prof Ng for giving us such a good survey of the techniques.

Topic		Replies	Views
How does increasing the neural networks layers (making it bigger) help reduce the bias? Advanced Learning Algorithms week-module-2	3	344	October 5, 2023
Adding a dense layer Convolutional Neural Networks coursera-platform	10	737	August 22, 2022
Course 1, week 4 assignment 2 Neural Networks and Deep Learning coursera-platform	1	542	December 22, 2021
C2_W3_Bias/variance and neural networks Advanced Learning Algorithms week-module-3	1	181	April 7, 2024
About improving the classification Custom and Distributed Training with TF week-module-4	3	518	September 20, 2022

Adding layers trigger Biasedness

Related topics