I’ve really enjoyed many of the courses and specializations, but one thing I think has been lacking is a discussion of some basic statistics. For example, in the book “Machine Learning Yearning“ it’s stated that
“With 10,000 examples, you will have a good chance of detecting an improvement of 0.1%.”
With the example of a classifier with a true accuracy of 0.9, and 10000 examples, the standard deviation of the measured accuracy, based on a binomial distribution, is sqrt(np(1-p)) = 30 = 0.3% of n, so an improvement of 0.1% is well within the noise.
I can understand that this seems like a bit of an academic argument (and I am an academic). In my case, I work with brain imaging datasets with n=100 to 10000. If I have two algorithms with accuracies of 70% and 75%, it is often difficult to be confident that one is significantly better than the other. I see many people, even in academic papers claiming subtle improvements that are not justified by the data.
A useful heuristic is that the uncertainty generally scales as sqrt(n), so you need to quadruple the size of your dev or test set to halve the uncertainty in the accuracy.
Maybe I should make a short course on statistics for ML?
