Why do we need a lot examples to train a ML model?

Levent · August 5, 2022, 8:20pm

In the minibathc video, it is said that because there are too many examples, sometimes it is better to just choose a few to implement gradient descent, for example.

Can it be possible to train a model to choose good examples from our data so that we dont need to use the full 100,000,000 examples?

More generally, why do we need lots of data to train a model? Is there a mathematical rationale?

janvari · August 5, 2022, 8:32pm

The reason we need more examples when working with ML is because the more data we have, it’s more probable that we have more samples that describe all the cases that can happen in that situation. The problem in the real world is that we don’t enough data to describe all possible outcomes, so the more it is collected the better. So basically, to get all the variety possible so the model can be robust to new data.

Akhil_Kota · August 6, 2022, 2:22am

Just to add to the previous reply, adding more data generally helps with reducing a problem of high variance (overfitting). If the ML model has high bias (underfits / inherently does not have the flexibility to adjust to the data), it is unlikely that training with more data will help much.

(Reference: Course 2, Week 3 in the ML Specialization)

Topic		Replies	Views
Add more Training Data to prevent overfitting Supervised ML: Regression and Classification week-3	2	472	January 12, 2023
Basic Recipe for ML - Week 1 - Train larger/More data? Improving Deep Neural Networks: Hyperparameter tun	6	541	April 11, 2022
Should a training dataset be big? Supervised ML: Regression and Classification week-1	2	497	July 24, 2022
Data Set Size for DL Structuring Machine Learning Projects	2	550	April 27, 2022
Overfitting in C1_W1_Lab_2_multi-output with small data sets Custom Models, Layers and Loss Functions with TF week-1	1	500	March 13, 2023

Why do we need a lot examples to train a ML model?

Related topics