Why Regularization Reduces Over Fitting Lecture

Moutasem_Akkad · April 9, 2022, 4:49pm

Hi,

If the purpose of Regularization is to cancel out the effect of some neurons. Doesn’t that mean in practice that we shouldn’t think too much about our NN size? We can just use Regularization later to get the right size?

paulinpaloalto · April 9, 2022, 5:15pm

I think there is the kernel of a good intuition there! Maybe going into a little more detail would flesh it out:

The point is that when you are tuning your hyperparameters, you have a lot of degrees of freedom. Of course that means that the search space is huge and it is intimidating and potentially pretty expensive to do an exhaustive exploration. Just think of it at the level of the number of layers and numbers of neurons per layer and activation functions and that’s already a huge search space. So rather than doing that exhaustive search, you can afford to have a bit of “overkill” in terms of the complexity of your network and then “dial in” some regularization to damp down the overfitting. If you think about tuning L2 regularization, for example, then it is way less complicated: you have only one hyperparameter to tune, which is the \lambda value. So it’s much less time consuming to figure out a good value for \lambda. A similar argument can be made for dropout, where you just need to tune the “keep probability” value.

Of course the size of your network has pretty direct implications for the training cost, so you don’t want to go too far overboard in terms of the total number of layers and neurons.

Topic		Replies	Views
[Course 2] Regularization effect with Smaller NN Improving Deep Neural Networks: Hyperparameter tun	1	550	August 7, 2022
Week 1: dropout vs reducing network? Improving Deep Neural Networks: Hyperparameter tun	14	1352	August 19, 2023
Normalizing the regularizer Improving Deep Neural Networks: Hyperparameter tun	4	481	April 28, 2023
Question About L2 Regularization Improving Deep Neural Networks: Hyperparameter tun week-1	3	144	April 29, 2024
Why do we want to make parameters smaller when we do regularization? Supervised ML: Regression and Classification week-3	2	532	May 1, 2023

Why Regularization Reduces Over Fitting Lecture

Related topics