In the minibathc video, it is said that because there are too many examples, sometimes it is better to just choose a few to implement gradient descent, for example.
Can it be possible to train a model to choose good examples from our data so that we dont need to use the full 100,000,000 examples?
More generally, why do we need lots of data to train a model? Is there a mathematical rationale?
The reason we need more examples when working with ML is because the more data we have, it’s more probable that we have more samples that describe all the cases that can happen in that situation. The problem in the real world is that we don’t enough data to describe all possible outcomes, so the more it is collected the better. So basically, to get all the variety possible so the model can be robust to new data.
Just to add to the previous reply, adding more data generally helps with reducing a problem of high variance (overfitting). If the ML model has high bias (underfits / inherently does not have the flexibility to adjust to the data), it is unlikely that training with more data will help much.
(Reference: Course 2, Week 3 in the ML Specialization)