Regularization increasing bias, but dev set performance stays the same


I’m working on a problem as I go through the course to apply the knowledge as I learn. So far everything has been incredible useful but when applying L2 or L1 regularization I’m unsure whether the resulting model is better or worse than before.

Before L2 regularization:

  • human error <0.5%
  • training set error <1%
  • dev set error 6%

Here I decided yo apply L2 regularization since variance seemed to be the biggest issue.

with L2:

  • training set error 4%
  • dev set error 6% (same as before)

So the model seems to be performing the same on untrained data regardless of whether it’s overfitting to the training set or not.

The other problem I have is that the regularized model takes around 10 times longer to train.

My question now is, is the model with L2 regularization still better? My thinking is that it is better because now it’s generalizing more so if I improve the bias (by building a bigger model or some other technique) it would probably mean that the variance will go down as well. Is my thinking correct? Although I don’t really know how to weigh in the training time.


The training set error seems to have increased but dev set error is the same, I would think the model is not overfitting now as it was before with L2 regularization. These measurements also depend on dataset size and distribution i.e. dev and train set have same distribution or not. A more complex model will be able to fit data better and provide better fitting nonlinearities. It will improve performance in general. The increase in training time is attributed to more computarions due to added regularization.

Thanks for the answer! I guess my intuition was so far off. One thing though, I meant that it now takes 10x the number of iterations to achieve the same dev performance, but each iteration is only a little bit slower with the L2 regularization, is this also expected? So the bulk of the extra time comes from the extra iterations needed. I even played around with the learning rate to try to make things faster, but the version with L2 still needs a lot more iterations.

One other aspect of having a regularized model is that the fitting becomes simpler and thus it would take more iterations for it to learn than would without regularization.