Decreasing Regularization

In C3W1 Bird Recognition problem number 10,

The answers of the question are (1) Train a bigger model (2) Try decreasing regularization.
I understand the answer (1). But I do not understand how decreasing regularization can help in this situation.

Because we are currently underfitting the training data, by using too much regularization, which has resulted in a model that is too simple:

A model that exhibits small variance and high bias will underfit the target, while a model with high variance and little bias will overfit the target.

Thank you for your answer.
I have one more question.

In the course, I learned as follows:
If we are “underfitting” the training data → larger network, train longer
If we are “overfitting” the training data → regularization, more data

So, if we are “overfitting” the training data, can we use “smaller” network and train “shorter”? Will this be appropriate as applying more regularization and getting more data?

Using a smaller network and regularizing the network is similar in the sense that regularization results in fewer weights doing the heavy lifting, which is the same as using a smaller network where all those weights are participating in the heavy lifting.

During hyperparameter tuning, the first thing to make sure is that you train for the right number of epochs, i.e., before you start overfitting because you train too long. Hence, you always use early stopping in some way or the other.
Next, you have to check how you do compared to your validation set. If you overfit, you add more data if possible, otherwise you start regularizing, by penalizing weights, adding dropout, decreasing network size or any other method you can think of.

Goodfellow writes

in practical deep learning scenarios, we almost always do find—that the best fitting model (in the sense of minimizing generalization error) is a large model that has been regularized appropriately…

1 Like