Difference between outlier and anomaly

Hi @tbhaxor

For example a popular approach is that you can learn your normal behaviour as „normal cluster“ and if a certain data point is too far away from this cluster conclude it is an anomaly.

Autoencoders for example are a popular choice for anomaly detection or you have a sufficient amount of normal labels and the problem is suiting. Can you provide more details on your specific problem?

To differentiate you could e.g. check if the distribution assumptions are satisfied in total: e.g. if you are assuming a normal / Gaussian distribution, all normal data should follow this distribution including potential black swan events (i think you refer to them as statistical outliers) that only occur super rarely. After all the normal distribution is defined for an unlimited range. Sampling a very large, sufficient amount of representative data would make sure our true distribution will be approximated in a acceptable manner.

This thread might be worth a look, too: Anomaly Detection with Different Probability Distributions - #5 by Christian_Simonis

Best regards
Christian