Regularization increasing bias, but dev set performance stays the same

Daniel_Espinosa · May 23, 2022, 8:14am

Hi!

I’m working on a problem as I go through the course to apply the knowledge as I learn. So far everything has been incredible useful but when applying L2 or L1 regularization I’m unsure whether the resulting model is better or worse than before.

Before L2 regularization:

human error <0.5%
training set error <1%
dev set error 6%

Here I decided yo apply L2 regularization since variance seemed to be the biggest issue.

with L2:

training set error 4%
dev set error 6% (same as before)

So the model seems to be performing the same on untrained data regardless of whether it’s overfitting to the training set or not.

The other problem I have is that the regularized model takes around 10 times longer to train.

My question now is, is the model with L2 regularization still better? My thinking is that it is better because now it’s generalizing more so if I improve the bias (by building a bigger model or some other technique) it would probably mean that the variance will go down as well. Is my thinking correct? Although I don’t really know how to weigh in the training time.

Thanks!

gent.spah · May 23, 2022, 11:01am

The training set error seems to have increased but dev set error is the same, I would think the model is not overfitting now as it was before with L2 regularization. These measurements also depend on dataset size and distribution i.e. dev and train set have same distribution or not. A more complex model will be able to fit data better and provide better fitting nonlinearities. It will improve performance in general. The increase in training time is attributed to more computarions due to added regularization.

Daniel_Espinosa · May 23, 2022, 5:44pm

Thanks for the answer! I guess my intuition was so far off. One thing though, I meant that it now takes 10x the number of iterations to achieve the same dev performance, but each iteration is only a little bit slower with the L2 regularization, is this also expected? So the bulk of the extra time comes from the extra iterations needed. I even played around with the learning rate to try to make things faster, but the version with L2 still needs a lot more iterations.

gent.spah · May 23, 2022, 6:30pm

One other aspect of having a regularized model is that the fitting becomes simpler and thus it would take more iterations for it to learn than would without regularization.

Topic		Replies	Views
Week 1 - Question 10. Debate Structuring Machine Learning Projects	14	1684	November 18, 2022
Tips on applying regularization AI Discussions	18	107	November 22, 2023
Training set error? Structuring Machine Learning Projects	1	649	October 22, 2022
Course 2, Week 1, Exercise Regularization Improving Deep Neural Networks: Hyperparameter tun	4	620	June 5, 2021
How to use dev set Structuring Machine Learning Projects	3	551	May 22, 2022

Regularization increasing bias, but dev set performance stays the same

Related topics