Any best practices on how much data is sufficient for ML/DL?

dds · November 30, 2022, 8:33pm

At various times Prof. Ng mentions “have sufficient data, large enough data set”, etc.

How much is sufficient or large enough? It would depend on many things, however, where can one find some heuristics/guidelines?

kchong37 · November 30, 2022, 9:35pm

Hi there, I think one way to check if the dataset is sufficiently large is to check the accuracy/test accuracy. There is also a ‘10 times rule’ to estimate the minimum sample size based on the degrees of freedom.

rmwkwok · December 1, 2022, 4:59am

Theoretically the feature space grows exponentially as each new feature is added, and so would the samples required to be statistically sound everywhere in the space. However, it is also often unrealistic to match with the speed of growth. I therefore agree with @kchong37 that a good metric result is when you know your sample size is large enough with respect to the model assumption you made. Since this is a very empirical stuff, I spent 3 mins on google search (I am lazy) and found this article which experimented different settings of “number of sample per degree of freedom” on the author’s dataset. However, I have never yet found any robust conclusion on the question you asked.

Raymond

Topic		Replies	Views
How much data is sufficient for training? Structuring Machine Learning Projects week-module-3 , coursera-platform	2	10	September 7, 2025
Sample size of human error Structuring Machine Learning Projects coursera-platform	1	515	April 9, 2022
Data Set Size for DL Structuring Machine Learning Projects coursera-platform	2	550	April 27, 2022
Carrying out Error Analysis Structuring Machine Learning Projects coursera-platform	5	609	April 26, 2022
Question: Power analysis for AI/ML? Machine Learning in Production	1	579	June 12, 2021

Any best practices on how much data is sufficient for ML/DL?

Related topics