C3_W1 Why use the Gaussian distribution

Andromeda18 · September 8, 2022, 4:27pm

Hi,

In the lectures, Professor Ng uses the Gaussian distribution to perform anomaly detection. I was wondering why the Gaussian distribution is used instead of other distributions.

Lukas_Mendes · September 8, 2022, 10:13pm

Hi @Andromeda18 !
We have to remember that this is 1 approach to the problem at hand, and there are others. In this approach, we use the statistical behavior of a normal distribution to detect anomalies. A normal distribution has several interesting features to address this problem that other distributions do not. I recommend you to study a little more about the normal distribution, for sure everything will be clearer for you.
To try to simplify for you, basically if an example is found after the third standard deviation of the mean in a normal distribution, its probability in view of that distribution will be very small, do you agree?
Hope this helps!

rmwkwok · September 9, 2022, 1:00am

Hello @Andromeda18, when we build our next anomaly detection system, it’s our job to verify that the sample’s distribution matches with our model assumption. For example, in the video, we assumed the samples to be gaussian distributed on each feature, and we assumed independence among features.

For whether or not it is gaussian distributed on a feature, quantitatively speaking, we can measure it by using method like the Kolmogorov–Smirnov test, and qualitatively, for example, we can examine whether the sample generation process on that feature dimension is an additive process. An example is the distribution of environmental vocal noise level is likely to be gaussian because the noise level is an addition of various noise sources which can be a car passing-by, pedestrian talking on phone or to each other, construction work, and so on. While any of these can be non-gaussian, the addition of them will become gaussian according to the central limit theorem.

Many processes are additive, so the gaussian distribution is a pretty popular choice for modeling a random variable.

Raymond

Andromeda18 · September 9, 2022, 2:58pm

Hi,

@Lukas_Mendes, I definitely agree with you that examples that are 3-sigma away from the mean are far less likely to exist. I suppose my question was related to the assumption that the data is normally distributed. I realize that the normal distribution applies to many natural phenomena and I know it’s a very popular choice for modelling random variables, but I think that, personally, I feel more comfortable undertaking some type of formal evaluation of the data’s distribution, like the one @rmwkwok mentioned.

Topic		Replies	Views
Anomaly Detection with Different Probability Distributions Unsupervised Learning, Recommenders, Reinforcement week-1	4	656	February 16, 2023
Why are non-Gaussian features not ideal for anomaly detection? Unsupervised Learning, Recommenders, Reinforcement week-1	1	437	June 15, 2023
Multivariate normal distribution vs Gaussian Mixture Models Unsupervised Learning, Recommenders, Reinforcement week-1	1	557	August 30, 2022
Categorical variables in anomaly detection Unsupervised Learning, Recommenders, Reinforcement week-1	4	617	September 22, 2022
Anomaly detection lab on fitting Gaussian distribution Unsupervised Learning, Recommenders, Reinforcement week-1	2	510	August 11, 2022

C3_W1 Why use the Gaussian distribution

Related topics