Sanity check before scaling up

In course 1, lesson 2, Andrew mentioned we can try to overfit a very small training dataset before spending days training the algorithm on a large dataset. He then gave an example of training on one audio clip and the model did not predict well with space as output. He states “Clearly it wasn’t working and because my speech system couldn’t even accurately transcribe one training example, there wasn’t much point to spending hours and hours training it on a giant training set.”

Would it be reasonable to train a model with one sample(one audio clip /image) to expect a good performance? Is one data point enough to tune and learn the hyper parameters? Thanks.

Hi @biowilliam and welcome to the course!

If you only have one sample, put it through the forward pass and backward pass of the training process, the model parameters are updated just one time from random initialization. I think it is hard to expect any good result with just one update of parameters. Of course, the model will learn something from that one point of data, but it is good or not depend on what accuracy do you expect.

The trend now is the more data you have the more chance you can get better accuracy.