Degree of polinomial vs regularization?

Hello Mehmet!

You have suggested a few ways, and let me summarize them:

  1. Arbitrarily choose a degree and then look for the best \lambda
  2. Just go through approach (1)
  3. Go through approach (1) and then add the regularization and look for the best \lambda

I think all three ways are reasonable. Look, there is no single best way to do it, but the thing is to make sure you train a model that performs the best with respect to the cv dataset. Let’s discuss each of the 3 ways from the above:

Way no. 1

Let’s say we pick deg = 4, and then we train a model with regularization enabled, and we arbitrarily set \lambda = 0.1. Then we train the model with the training set, and evaluate the model with the cv set. Then we start asking the questions which Andrew had mentioned in the videos, and we come up with a conclusion of whether we are overfitting or we are underfitting. If we are overfitting, then we have 2 choices: we either reduce deg or we increase \lambda. Note that both ways can give you a very good model, but at the end of the day, the decision will be based on the performance on the cv dataset.

Way no.2

Let’s say we have results for deg from 1 to 10. Then we compare their performance on the same cv dataset, and find out deg=2 to be the best, then we start asking ourselves the questions which Andrew had mentioned in the videos, and we come up with a conclusion of whether we are overfitting or we are underfitting. If we are overfitting, then this time, we will enable the regularization and arbitrarily try a lambda > 0, then we see whether this will bring us a better model with respect to the cv dataset. This time we won’t need to try a smaller deg, because we had already tried it.

Way no.3

This is the same as Way no. 2 because in my discussion of the way no.2, we do not stop at the best degree model, and we will look into regularization if it overfits.

In summary, I want to tell you again that there is no single way for doing this. There is no best way. There is only your way which you have walked through to get to the best model. You may walk through this path, and I may walk through that path, and no one knows your final model will do better or mine will do better until both of us deliver our model for a final comparison.

If you do a round of many trials (like Way no. 2), then you will spend more time waiting for those results before you can start asking those questions which Andrew had mentioned to decide whether it is overfitting or underfitting,. If you do one model at a time (Way no. 1), then you don’t need to wait that long. Waiting is the difference.

Cheers,
Raymond

1 Like