In course 1, lesson 2, Andrew mentioned we can try to overfit a very small training dataset before spending days training the algorithm on a large dataset. He then gave an example of training on one audio clip and the model did not predict well with space as output. He states “Clearly it wasn’t working and because my speech system couldn’t even accurately transcribe one training example, there wasn’t much point to spending hours and hours training it on a giant training set.”
Would it be reasonable to train a model with one sample(one audio clip /image) to expect a good performance? Is one data point enough to tune and learn the hyper parameters? Thanks.