How and why do training and cross validations sets wear out in time?

In google’s crash course on ML, I have read an interesting tip on train&validation sets. they say

'Test sets and validation sets “wear out” with repeated use. That is, the more you use the same data to make decisions about hyperparameter settings or other model improvements, the less confidence you’ll have that these results actually generalize to new, unseen data.
If possible, it’s a good idea to collect more data to “refresh” the test set and validation set. Starting anew is a great reset.

how is that possible? how and why should it wear out? And what does ‘wearing out of the data’ mean to begin with?’ And can I solve this by randomly re-partitioning the sets into three(train-test-cv) for different models rather than using the same partitions for all?

Hi mehmet,

The idea behind “wearing out” the test or validation set is that if you use the same data repeatedly to make decisions about your model, that particular data may start to overfit. Overfitting occurs when a model is tied too tightly to the training data and performs poorly on new, unseen data. This means that the more often you use the same test or validation set, the less confident your model will generalize well to new data.
One way he solves this problem is to collect more data and update the test and validation sets. By starting fresh with new data, you can better understand how well your model performs on unfamiliar data. Another option is to randomly repartition the set into training, testing, and validation sets for different models, rather than using the same partitions for all. This ensures that the model is evaluated against different data.

Hope so this answers your question
Muhammad John Abbas


In addition to the previous answer: According to CRISP-DM, eventually we want to deploy our model and operate it so solve our business problem.
As @Muhammad_John_Abbas mentioned correctly, the performance of the model on test data (which was never seen before) is your litmus test that has to be successful to have sufficient evidence your machine learning pipeline performs well enough on new data. If OK you can go for a deployment, e.g. in the cloud.

In reality you also have:

  • distribution shifts of data (think of traffic / remote work during the pandemic situation in 2020)
  • IoT swarm intelligence applications which are designed to get better and more powerful over time as self learning systems like this one

This means in reality you have to deploy often, e.g. after a new training where you incorporated new knowledge into your model.

Also in this situation you need to conduct your litmus test with test data which were never seen before, before deployment and operations. (A test data set from a previous litmus test would be an issue since this information already made it somehow into the previous process and is actually not „new“).

I hope this view helps, @mehmet_baki_deniz. Have a good one!

Best regards


I thank both of you for the answers
what about re-partioning the data for each model test @Christian_Simonis ? do you think this will solve the issue?

Hi @mehmet_baki_deniz

to understand you correctly: with re-partitioning do you mean some kind of:

As long as you have solid evidence to assume that your test set is quite realistic and representative to what the model will see in reality and the test set is new in terms of you did not incorporate it into training or tuning features or hyperparameters, this should be fine.

Best regards

1 Like