Model Selection based on CV or Test & Diff b/w CV and Test data

That is a good question and I know most of the time, this is the confusion comes related to cross validation and test dataset. But if you are thinking both data are used only to get the prediction, then this is a bit incomplete understanding.

Cross validation data is basically used to check how really good is your model in relation to training data where as test data usually check how the model would perform which has been trained on a cv data.

CV data gives unbiased evaluation of the model’s performance and to fine-tune the model’s parameters where the test dataset is used after the model has been fully trained (this fully trained means evaluation i.r.t. to cv data) to assess the model’s performance on completely unseen data.

Just imagine in general terms, you prefer taking a quality product like an iPhone from an apple showroom than from any normal shop. Why? Because one would think people at the apple showroom would have better knowledge about what they are selling and you would get a quality product.

CV data does something like same with the training dataset than compare to the test dataset. Like how Tom mentioned in CV data we optimise the model in a way to get as best as model can perform but where as test dataset is used on training dataset which has been performed or checked with this CV data for creating the most robust model performance.

So CV data is like food inspector for a restaurant and customers are like test dataset and both are basically try and testing the food at a restaurant.

I have given you two examples, hope you understood now.

Regards
DP

2 Likes