What is the reason behind having test set and dev set?

mr_raven · June 20, 2025, 3:28pm

As far as I have understood from the material so far is that training set is used for estimating the optimal values for weights and biases, the dev set is used for evaluating generalization error for model selection, and the test set is used for evaluating generalization error on unseen data after training is complete.

The rationale being, if we are selecting the model based on the generalization error of the test set, we are already being biased by what model performs best on the test set and not on some unseen data.

So, selecting the model based on the generalization error of the dev set lets us understand what the actual generalization error is when calculated on the test set after training is already complete.

Now, say I select the model based on the dev set. I finish training my model on the training set. Then, after calculating the generalization error on the test set, I find that the model performs poorly.

I would guess that I now rethink my approach and then use some other approach afterwards. If that is the case, am I not still being influenced by the results on the test set?

If I do not change my approach after poor results on the test set, why use the test set in the first place?

TMosh · June 20, 2025, 4:25pm

That’s not exactly what we’re doing.

We’re using the test set as a spot-check on how the completed model performs. If it’s not acceptable, then we can go back to square-one and create an entirely new model.

Note that the sets are all randomly selected from the pool of labeled examples. So you can always try many random splits using the same model architecture, train and validate anew, and then look at a whole family of test set results (each will be statistically independent). This would give a better statistical grip on how well the finished model works.

mr_raven · June 20, 2025, 5:26pm

What is the difference between choosing the model based on the results from the test set during training and choosing a new model from square-one based on the results from the test set after completed model?

TMosh · June 20, 2025, 5:41pm

We’re not directly modifying the model by using the test set.
We’re either accepting the model, or rejecting it and starting over.

Topic		Replies	Views
How to use the train-dev set? Structuring Machine Learning Projects coursera-platform	3	332	December 22, 2023
Cross-validation Error vs Generalization Error Advanced Learning Algorithms week-module-3	7	685	August 31, 2022
What is the value of the generalization error estimate using the "test" set of data? Advanced Learning Algorithms week-module-3	1	345	September 12, 2023
Test Error Utility Advanced Learning Algorithms week-module-3	1	483	September 7, 2022
Why not evaluate models on test set? Deep Learning Resources	4	228	July 12, 2022

What is the reason behind having test set and dev set?

Related topics