Estimating the Quantity of Training Data

How do I determine how much data is sufficient to train a model? More is better, I get that, of course. But how do I know how much is enough?

Hello @jay_five! Welcome to our community.

That is an interesting question. But there is not any fixed answer. The amount of data depends on the nature of the problem and several other factors.

However, one way is to train your model with the available data. Then plots the performance (such as accuracy or loss) against the amount of data used for training. (Data on x-axis and performance of the y-axis). Then you will see if adding more data leads to a significant improvement in performance. If adding more data does not yield any noticeable improvement in performance, then it may not be necessary to collect or use additional data.

I hope this helps! If you have any further questions, feel free to ask.



Hi @jay_five

in addition to @saifkhanengr‘s answer: this thread created by @ai_curious can be interesting for you since it touches upon a similar topic for CNNs: How much data does a CNN need to learn? - #2 by Christian_Simonis

Best regards

1 Like

Thank you very much!

I updated my experiments mentioned by @Christian_Simonis above. You can find the new thread here: How much data does a CNN need to learn - continuation

Notice that it features the accuracy / loss plots mentioned by @saifkhanengr, though for simplicity of my code I just collected the separate plots from my several runs and didn’t put them all into a single graphic (cuz I’m lazy like that). Hope it helps