How small is the 'small training set'?

In one of the videos of week 1 of Deep Learning and Neural Networks course Andrew says that there is not much differentiation between performance of traditional and deep learning approaches. How small is the ‘small training set’? Wondering with what size of data I can not bother doing deep learning and traditional learning algorithms will be equally good?

This is not something that can be measured with the size of the training set only.

You have other factors to condider such as: the problem to be solved, how is the set representative of the problem, nature of the reality test set, deep learning vs other algorithms performance and computation resource usage…

Yes, Gent is absolutely right. And, I remember @Christian_Simonis has written some posts on this topic. Am I right, Christian? If so, kindly share it here with @Adilbek_Salimgereyev.


Welcome to the community, @Adilbek_Salimgereyev: Good question!

In addition to @gent.spah‘s great reply:
How much data is needed to solve a specific problem can be quite challenging to determine in the conflict of interest between data acquisition cost and technical excellence. One way to quantity the (expected) information e.g. via (Shannon) entropy approaches can be active learning (AL) where model uncertainty can be utilized, see also: How much data does a CNN need to learn? - #2 by Christian_Simonis

So, Active learning can help to quantify:

  • which label is expected to provide a valuable benefit and also
  • when a sufficient amount of data has been used to train your model.

This thread on the batch could be interesting for you if you are interested in AL.
Many Thanks, @saifkhanengr, for the hint!

Best regards


Regarding the question of when to go for classic ML or deep learning, I am also afraid there is no crystal clear decision border, but in general points to consider are:

  • the more unstructured your data is (e.g. images, video, text, …)
  • the more complex and abstract your problem is (e.g. face recognition in a video sequence)
  • the bigger the data (where you hopefully have high quality labels)

… the higher the potential of Deep Learning should be since DNNs w/ advanced architectures (like transformers but also architectures w/ convolutional | pooling layers) are designed to perform well on very large datasets and also process highly unstructured data like pictures or videos in a scalable way: basically the more data, the merrier!

Compared to classic ML models, DNNs possess less structure and can learn more complex and abstract relationships given a sufficient quality and quantity of data, see also this thread.

Hope that helps!

Best regards