Questions about automatically choosing model

Junxi_Li · August 31, 2023, 6:09am

Hi everyone, I just watched the video about how to use cross validation set to choose the best model. I’m wondering if we use training set and cross validation set to choose the model and use test set to run the test of that model’s prediction. What is the point of test data? If the model is chosen by training set and dev set, and it performs badly on the test set, do we stick to this model or do we choose another one? So is the purpose of test set in this case only to measure how well the model is performing, but not to impact the decision of which model to use?

saifkhanengr · August 31, 2023, 6:13am

Maybe this article will clear your doubt.

TMosh · August 31, 2023, 6:19am

The test set provides a final check on the performance of your completed system.

If the test set results are not good enough, then you go back to the beginning and improve the model.

Junxi_Li · August 31, 2023, 9:33am

Thanks! That makes it much clearer for me.

Junxi_Li · August 31, 2023, 9:34am

Got it! Thanks a lot for the answer!

Juan_Olano · August 31, 2023, 4:28pm

Hi @Junxi_Li ,

It is weird that you get a good result with the validation, and a terrible result with the test. What this tells me is that may be the test dataset has a completely different distribution than the training and validation datasets.

If I were faced with this situation, I would immediately doubt my split and I would start again. I would:

Do some EDA in my data to make sure the dataset is healthy
Reload and shuffle my data
Create the split. In creating the split I would consider doing stratification to make sure that each split has a balanced representation of all classes (particularly in classification models), and also, depending on the case, if applicable I would use grouping to cross-validate, but again making sure the groups are within the same distribution.

Topic		Replies	Views
Cross validation sets Advanced Learning Algorithms week-3	4	424	July 16, 2023
C2_W3 model selection Advanced Learning Algorithms week-3	8	54	April 1, 2025
Train,dev set Improving Deep Neural Networks: Hyperparameter tun week-1 , coursera-platform	1	13	October 25, 2024
About dev and test sets Advanced Learning Algorithms week-3	3	523	March 14, 2023
Cross-validation Error vs Generalization Error Advanced Learning Algorithms week-3	7	652	August 31, 2022

Questions about automatically choosing model

Related topics