Is it logical to mix and match traing and test sets once model is selected?

mehmet_baki_deniz · February 10, 2023, 12:47pm

hi
once a model is selected with hyperparameters, should we re-train with all the dataset before sending it to real world?
I have read this in a book and it looked logical to me since more data means less overfitting.
I wonder your insights on the topic

here is the quote from the book(Corey Wade-hands on gradient boosting)

'When testing, it’s important not to mix and match training and test sets. After a final model has been selected, however, fitting the
model on the entire dataset can be beneficial. Why? Because the goal is to test the model on data that has never been seen and
fitting the model on the entire dataset may lead to additional gains in accuracy.

gent.spah · February 10, 2023, 3:03pm

Makes sense to do as said in the book.

ai_curious · February 10, 2023, 3:09pm

I agree, but wonder if it is best to train from scratch on the combined train+val+test, or to treat it more like a transfer learning problem; refining the weights learned on train set with the likely smaller val + test sets? Seems like starting from scratch introduces the possibility of going off the rails with no way to assess except when it underperforms in production.

gent.spah · February 10, 2023, 3:15pm

Yes it should be transfer learning, after all as you say it might overfit and under perform in production. At least with the val/test splits you are avoiding some overfitting for this dataset.

Topic		Replies	Views
Retrain model on the whole data set (include test set) when deploying a model Advanced Learning Algorithms week-module-3	2	318	November 6, 2023
Selecting the right model Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	534	July 5, 2022
C2_W3 model selection Advanced Learning Algorithms week-module-3	8	58	April 1, 2025
Could overfitting test set? Introduction to TF for Artificial Intelligence ... week-module-4	1	591	June 15, 2022
Why do we need to have a validation set for training? Advanced Learning Algorithms week-module-3	17	955	February 8, 2023

Is it logical to mix and match traing and test sets once model is selected?

Related topics