Statistics for machine learning

I’ve really enjoyed many of the courses and specializations, but one thing I think has been lacking is a discussion of some basic statistics. For example, in the book “Machine Learning Yearning“ it’s stated that

“With 10,000 examples, you will have a good chance of detecting an improvement of 0.1%.”

With the example of a classifier with a true accuracy of 0.9, and 10000 examples, the standard deviation of the measured accuracy, based on a binomial distribution, is sqrt(np(1-p)) = 30 = 0.3% of n, so an improvement of 0.1% is well within the noise.

I can understand that this seems like a bit of an academic argument (and I am an academic). In my case, I work with brain imaging datasets with n=100 to 10000. If I have two algorithms with accuracies of 70% and 75%, it is often difficult to be confident that one is significantly better than the other. I see many people, even in academic papers claiming subtle improvements that are not justified by the data.

A useful heuristic is that the uncertainty generally scales as sqrt(n), so you need to quadruple the size of your dev or test set to halve the uncertainty in the accuracy.

Maybe I should make a short course on statistics for ML?

hi @Richard_Watts

I guess when it comes to handling image dataset, one should just not stick to just accuracy as there would be variance into data distribution due to randomness. Checking class imbalance, precision, recall and F1 score and other metrics like (SHAP AND LIME techniques) would be more beneficial.

Remember in some image classification model algorithm even 85% accuracy would not be could enough if incorporating into real-time data.

I am thinking twice about this. If I need a standard deviation value to compare with 0.1% which is \Delta\text{performance}, since \text{performance} is a continuous value, shouldn’t its associated distribution be a continuous one? For that standard deviation value, could its squared (variance) be something like the following?

This is what cross validation (CV) does.