How to Choose a Subset for Initial Model Training?

neverstoppredicting · October 20, 2024, 10:01am

If you have 1 million rows and want to train a model using a smaller subset to test its effectiveness, how many rows would you start with?

TMosh · October 20, 2024, 3:30pm

An answer requires more information.
How many features are in each example?

neverstoppredicting · October 20, 2024, 8:55pm

Oh, good question, let’s say 100

TMosh · October 20, 2024, 9:37pm

A rule of thumb is you want m >> n, to minimize effects from overfitting.
So with n = 100, I’d start with maybe 1000 randomly selected examples. See how that goes and increase if you see have overfitting during training.

neverstoppredicting · October 20, 2024, 11:11pm

Thank you for your response!

So this means if I have images and n=10k, I might need 100k to start?

Topic		Replies	Views
Should a training dataset be big? Supervised ML: Regression and Classification week-1	2	497	July 24, 2022
Week2 doubt- trying out model on subset of data Introduction to Machine Learning in Production	1	548	June 10, 2021
Why do we need a lot examples to train a ML model? Unsupervised Learning, Recommenders, Reinforcement week-3	2	491	August 6, 2022
Data Set Size for DL Structuring Machine Learning Projects	2	550	April 27, 2022
Basic Recipe for ML - Week 1 - Train larger/More data? Improving Deep Neural Networks: Hyperparameter tun	6	541	April 11, 2022

How to Choose a Subset for Initial Model Training?

Related topics