Week 3 model selection

I understand the rational of adding cross validation set into training and test split, using cv set to train parameters and select features. But let’s say there are 3 models under consideration. Model #2 has lowest cv set error. Hence model #2 is selected. But after we run test set, model #3 has lowest test set error, will we change from model #2 to #3?

The big-picture goal is to minimize the test set error.

This is because the test set is an independent set of data that was not used creating the model. So it is a good measure of how the model will work in practice.

2 Likes

So in my example, we do change from model #2 to #3?

We pick the model that gives the lowest test set error.

@flyunicorn,
The question is tricky. Our goal is to minimize the test set error. But, if we look at the test set error to choose between models (e.g., switch to Model #3 because it performed better), then the test set is no longer unbiased, it becomes part of the training process and act as another cross-validation set. In practice, however, it’s not uncommon for dev and test sets to be the same, especially in smaller or informal projects. This is not the best practice, because your model may become overfit to the test set, giving a falsely optimistic performance estimate.