Should a training dataset be big?

Hello everyone, I hope to get an answer for my question, which is: I have a dataset with 10,000samples and I want develop classification or regression ML model, should I use the whole dataset (10,000) to build ML model or should I select a few samples to build ML model? If few samples have been selected to train my model, what type of technique should I employ for selecting samples?

Tarek

This is covered later in the course.

Generally you need three subsets of your data.

  • A training set.
  • A validation set (used for adjusting any hyperparameters, such as the regularization value).
  • A test set (used as a final test of the completed system).

A good split between them is 60%-20%-20%.

For now, the course is only discussing training, so we don’t consider validation and test yet.

1 Like

Hi @Tarek_Rashed , later in the course and next course, it will address this in details, with different scenarios like 10,000 samples or million samples or just a few hundred

1 Like