If you have 1 million rows and want to train a model using a smaller subset to test its effectiveness, how many rows would you start with?
An answer requires more information.
How many features are in each example?
Oh, good question, let’s say 100
A rule of thumb is you want m >> n, to minimize effects from overfitting.
So with n = 100, I’d start with maybe 1000 randomly selected examples. See how that goes and increase if you see have overfitting during training.
Thank you for your response!
So this means if I have images and n=10k, I might need 100k to start?