[Course 2] Regularization effect with Smaller NN

Hagar_Abouroumia · August 5, 2022, 12:17pm

Why does using a smaller NN seems like to have a regularization effect?

Thanks

paulinpaloalto · August 7, 2022, 3:46pm

It is the case that using a smaller (simpler) NN architecture and using regularization are two different ways to approach the problem of overfitting (high variance). I am not a theoretician or an expert in any of this, so it’s also possible that the two methods are theoretically equivalent. But it is common in mathematics that things can be theoretically equivalent, but not equivalent in practical terms. Let’s suppose that you have an overfitting problem and want to approach that by using a simpler network. The issue you immediately face is that there are lots of “degrees of freedom” in terms of ways you could approach that: you could decrease the number of neurons in some or all of the hidden layers. You could try fewer hidden layers, but perhaps with a few more neurons in some of them such that the aggregate complexity is lower. But for every choice like that, you then have to retrain your network and compare the results. So it will be a complicated search space and require some thinking about how to explore that space in an organized and efficient way. As an alternative, consider how you could appoach that by using L2 regularization: you only have one “knob” to turn, which is the \lambda value to use. That’s a linear space to search, so you could just pick a few \lambda values across a range from small to large and then fine-tune from there.

In other words, you could achieve the same goal by using a smaller NN architecture, but it might be easier to just start with a network that is a bit too complex (overkill) and then dial in just the right amount of regularization.

Please note that I am just a fellow student and do not have any practical experience applying these techniques. So the above is just an idea that I’m suggesting, not something I’ve heard Prof Ng specifically state.

If I’m missing your point, please let me know.

Topic		Replies	Views
Why Regularization Reduces Over Fitting Lecture Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	499	April 9, 2022
Regularization vs Decreasing Units Manually Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	627	October 26, 2021
Decreasing Regularization Structuring Machine Learning Projects coursera-platform	3	606	November 2, 2021
Degree of polinomial vs regularization? Advanced Learning Algorithms week-module-3	16	527	January 18, 2023
Network size and bias variance tradeoff Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	656	April 26, 2021

[Course 2] Regularization effect with Smaller NN

Related topics