Anomaly Detection with Different Probability Distributions

Terry_Green · August 4, 2022, 5:03pm

The lectures on anomaly detection discuss transforming input data to get it to closely resemble the Gaussian probability distribution. Could we also consider fitting other probability distributions if we cannot transform the data to look like Gaussian? In practice, are other distributions used? One example I am thinking of is reliability analysis for time to failure with distributions such as Weibull or Exponential.

gent.spah · August 4, 2022, 6:41pm

My first thoughts on this are that the gaussian distribution is a distribution that models naturally occuring phenomena. In fact if you notice in nature, things tend to be in balance, and so this distribution is spread around a balanced (middle) point. This makes it a good candidate for natural phenomena. Other distributions could be used as well if they describe a phenomena better, but i doubt that any, at least natural phenomena is not balanced in someway. Have a look on this link too Normal distribution

JAVIER_HERNANDEZ1 · February 16, 2023, 5:45pm

I agree with you because that is the point of anomaly detection, detect those out of the naturally occurring phenomena

shanup · February 16, 2023, 7:20pm

The normal distribution has been studied well enough with established metrics such as z-score. So, it becomes easier to assess that a sample that is 3 \sigma away from the mean would be a rare occurence.

If we have similar established standards for other distributions, then by all means we could use those as well for anamoly detection.

Christian_Simonis · February 16, 2023, 8:42pm

This is a really good question!

When it come to anomaly or black swan events, in fact many phenomena are not following a Gaussian distribution, but they have havier tails (e.g. like expected returns in the stock market or medical health indicator data).

So using fat tail distributions like student t-distribution can absolutely work for anomaly detection and often it makes much sense since especially the long tail events come with significant costs if not detected (the cost of the false negatives). Here is a paper using generalised student t approach for anomaly detection which could be interesting to take a look at: https://people.cs.vt.edu/~clu/Publication/2013/AAAI-Lu-2013.pdf

Another note on anomaly detection models, e.g. vanilla variational autoencoders rely on a Gaussian prior in the latent space in general but there are also extensions discussed in literature e.g. to fat tail distributions like student t distribution: [2004.02581] Variational auto-encoders with Student's t-prior

Hope that helps, @Terry_Green!

Best regards
Christian

Topic		Replies	Views
C3_W1 Why use the Gaussian distribution Unsupervised Learning, Recommenders, Reinforcement week-1	3	573	September 9, 2022
Why are non-Gaussian features not ideal for anomaly detection? Unsupervised Learning, Recommenders, Reinforcement week-1	1	437	June 15, 2023
Anomaly detection lab on fitting Gaussian distribution Unsupervised Learning, Recommenders, Reinforcement week-1	2	510	August 11, 2022
Categorical variables in anomaly detection Unsupervised Learning, Recommenders, Reinforcement week-1	4	619	September 22, 2022
Video with data transformation to gaussian distribution Convolutional Neural Networks	4	559	March 13, 2023

Anomaly Detection with Different Probability Distributions

Related topics