Should a training dataset be big?

Tarek_Rashed · July 23, 2022, 11:22pm

Hello everyone, I hope to get an answer for my question, which is: I have a dataset with 10,000samples and I want develop classification or regression ML model, should I use the whole dataset (10,000) to build ML model or should I select a few samples to build ML model? If few samples have been selected to train my model, what type of technique should I employ for selecting samples?

Tarek

TMosh · July 24, 2022, 2:18am

This is covered later in the course.

Generally you need three subsets of your data.

A training set.
A validation set (used for adjusting any hyperparameters, such as the regularization value).
A test set (used as a final test of the completed system).

A good split between them is 60%-20%-20%.

For now, the course is only discussing training, so we don’t consider validation and test yet.

sangdinh · July 24, 2022, 3:43am

Hi @Tarek_Rashed , later in the course and next course, it will address this in details, with different scenarios like 10,000 samples or million samples or just a few hundred

Topic		Replies	Views
How to Choose a Subset for Initial Model Training? AI Discussions ai-discussions	4	24	October 20, 2024
Why do we need a lot examples to train a ML model? Unsupervised Learning, Recommenders, Reinforcement week-3	2	491	August 6, 2022
About training and testing dataset of Logistic Regression using Scikit-Learn Supervised ML: Regression and Classification	6	264	July 10, 2022
Week2 doubt- trying out model on subset of data Machine Learning in Production	1	548	June 10, 2021
Creation of Big datasets Supervised ML: Regression and Classification week-3	1	482	August 24, 2022

Should a training dataset be big?

Related topics