Model selection question

whats he problem here cant understand and how does using cross validation make it better?

Hi @Saim_Rehman,

I have made up some cost values there

Given the four cost values, which d (among 1, 2, 3, 10) works the best?


2 should be because lowest

Hello @Saim_Rehman,

Yes! Among the J_{test} that I have made up for our discussion, d=2 has the lowest cost and thus works the best! It is great that you were able to jump out of the slide (which assumed d=5 to be the best) and made the correct judgement based on your knowledge! It is very important that we understand the rationale behind rather than just sticking to the slides.


my qn was why is j test a problem as andrew mentioned in the image attached and how cv test is better

Alright. I have just watched the video again, so you were asking about the video starting from roughly 1:45.

The problem is that you choose the best model based on J_{test} and you report J_{test} as the generalization error. Your model comprises two parts: trainable parameters and hyperparameters. Basically you need both to be set just right in order to say “I have trained a model”. What the training set does for you is to get you the best set of trainable parameters, however, it does not tune the hyperparameters for you. d is a hyperparameter here. To choose the best d, we use the test set. Therefore, in a broader sense, both the “training set” and the “test set” have become the dataset that you use to create the model, consequently, neither of them can give you a fair estimate of the generalization error because the generalization error should be estimated based on an unseen set of data. Note again that, so far, the model we create are based on both the training set and the testing set, and so the model so created must perform well on them. We need an unseen set of data to get an estimate of the generalization error.

Therefore, the correct way is to choose the best model based on J_{cv} and report J_{test} as the generalization error. You tune the trainable parameters with the training set, you choose the best hyperparameters with the cv set, and keep the test set always invisible to the process of creating the model. Once the model is created, you evaluate it with the test set as the generalization error.