About dev and test sets

abdou_brk · March 13, 2023, 12:43pm

hi
there is something i cannot understand . I don’t see why we need a cross val set . i think the test error is a good measure of how the model is doing . Yes , when we train our model on training data and measure the performance on that same set of data it is misleading because the model has already learnt to fit these data but when coming to choose a model based on test set errors i don’t see why do we need an extra set to measure the final performance because it’s not about the fact that we introduce a new parameter ( for example degree here) but the point is that the model had never seen these test data so i think it’s a good way to measure its performance …
i don’t know i can’t see a strong logical reason for why we need that and just saying " the error is likely to be an optimistic estimate of generalization error " isn’t convincing for me

gent.spah · March 13, 2023, 12:50pm

Well yes there are model trainings with just 2 sets but the purpose of the cross val set is so that you can go back and optimize training again. So in a way the model sees the cross/val set but it never sees the test set at all.

Also the more seperate sets you have to test your model as long as they come from a same distribution the better it is.

abdou_brk · March 14, 2023, 8:07pm

but the model didn’t actually see the val set . for the training set the model is trained based on the data it tries to minimize the cost function based on the input features and the targets but it’s not the case here . i still don’t understand !

Christian_Simonis · March 14, 2023, 8:22pm

The test set should only be used once as a litmus test before deploying.

Usually you need the val set because architecture, hyperparameters or or features are adjusted and improved just because CRISP-DM is a highly iterative process.

Cross-validation is a nice approach to avoid overfitting and see how well the model generalizes. Also: This thread might be worth a look for you, @abdou_brk: How and why do training and cross validations sets wear out in time? - #5 by Christian_Simonis

Hope that helps!

Best regards
Christian

Topic		Replies	Views
About cross validation an test sets Advanced Learning Algorithms week-3	1	465	March 12, 2023
Cross-validation Error vs Generalization Error Advanced Learning Algorithms week-3	7	636	August 31, 2022
Questions about automatically choosing model Advanced Learning Algorithms week-3	5	355	August 31, 2023
Why do we need to have a validation set for training? Advanced Learning Algorithms week-3	17	876	February 8, 2023
Train,dev set Improving Deep Neural Networks: Hyperparameter tun week-1	1	12	October 25, 2024

About dev and test sets

Related topics