Statistics for machine learning

Richard_Watts · August 18, 2025, 4:43am

I’ve really enjoyed many of the courses and specializations, but one thing I think has been lacking is a discussion of some basic statistics. For example, in the book “Machine Learning Yearning“ it’s stated that

“With 10,000 examples, you will have a good chance of detecting an improvement of 0.1%.”

With the example of a classifier with a true accuracy of 0.9, and 10000 examples, the standard deviation of the measured accuracy, based on a binomial distribution, is sqrt(np(1-p)) = 30 = 0.3% of n, so an improvement of 0.1% is well within the noise.

I can understand that this seems like a bit of an academic argument (and I am an academic). In my case, I work with brain imaging datasets with n=100 to 10000. If I have two algorithms with accuracies of 70% and 75%, it is often difficult to be confident that one is significantly better than the other. I see many people, even in academic papers claiming subtle improvements that are not justified by the data.

A useful heuristic is that the uncertainty generally scales as sqrt(n), so you need to quadruple the size of your dev or test set to halve the uncertainty in the accuracy.

Maybe I should make a short course on statistics for ML?

Deepti_Prasad · August 18, 2025, 5:15am

hi @Richard_Watts

I guess when it comes to handling image dataset, one should just not stick to just accuracy as there would be variance into data distribution due to randomness. Checking class imbalance, precision, recall and F1 score and other metrics like (SHAP AND LIME techniques) would be more beneficial.

Remember in some image classification model algorithm even 85% accuracy would not be could enough if incorporating into real-time data.

rmwkwok · August 18, 2025, 11:05am

I am thinking twice about this. If I need a standard deviation value to compare with 0.1% which is \Delta\text{performance}, since \text{performance} is a continuous value, shouldn’t its associated distribution be a continuous one? For that standard deviation value, could its squared (variance) be something like the following?

This is what cross validation (CV) does.

Topic		Replies	Views
Test Accuracy Higher than Train accuracy? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	42	2727	August 9, 2024
DLS Course 3 week 2 error analysis details Structuring Machine Learning Projects coursera-platform	2	661	May 13, 2021
How to understand accurcy of a binary classifier vs random Neural Networks and Deep Learning coursera-platform	1	567	April 29, 2021
Module1, Setting Up your Goal: Is one test set sufficient for an adequate model performance estimation? Structuring Machine Learning Projects coursera-platform	11	542	March 29, 2023
Week 1 - Question 10. Debate Structuring Machine Learning Projects coursera-platform	14	1726	November 18, 2022

Statistics for machine learning

Related topics