Anomaly Detection Algorithm Statistical Independence

yusufnzm · January 22, 2023, 2:44pm

What does Andrew Ng mean when he says, " But it turns out this algorithm often works fine even that the features are not actually statistically independent."?
Around 1:55

Christian_Simonis · January 22, 2023, 3:00pm

Hi @yusufnzm,

Here is a good definition for statistical independence:

Two events are independent , statistically independent , or stochastically independent [1] if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

Source

E.g. if you draw a card out of a deck (event A) and put it back afterwards, then draw again (event B), these events A and B are independent since you always draw randomly from the „same deck“.

However if you would not put the card back after A, this would mean that A affected B and both events are not independent after all.

For large data features you can also say: independence means these two features are not correlated. So the Pearson correlation coefficient is strictly speaking 0 or at least close to zero.

Often you learn a lot when visualizing the data and see if they are statistically dependent or not, see also this source where the residuals are evaluated (here you do not want any correlation or let’s say statistical dependency to your features):

What Prof. Ng means is that anomaly detection often works well, even though the features might not be independent from each other. My experience is: Its also possible they features carry mutual information or have a slight correlation. How to tackle this and get rid of redundant information in your features is described in this repo. The benefit is a better ratio of data to feature space which can contribute to better generalisation.

Hope that helps!

Best regards
Christian

Christian_Simonis · January 22, 2023, 3:11pm

Side note for completeness: You cannot make this conclusion the other way around, that if between two features, the correlation = 0, those features would be independent. This would not be true in general.

This does not work since only the linear dependency is assessed but still other kind of dependencies are possible, see also this viz e.g. if you have a circular pattern / dependency:

I would really recommend to visualise your data! This helps often to find patterns or see dependencies.

Best regards
Christian

Topic		Replies	Views
C3_W1_Theoretical discussion Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	505	August 23, 2022
Anomaly Detection, how are features x1, x2 related AI Discussions ai-discussions	6	21	December 17, 2024
Anomoly detection: Independence assumption, also bimodal distributions Unsupervised Learning, Recommenders, Reinforcement week-module-1	3	341	March 5, 2024
Product of Random variables in Anomaly Detection Unsupervised Learning, Recommenders, Reinforcement ai-discussions	1	7	August 27, 2024
Using correlation matrix for feature selection AI Discussions	10	2721	January 25, 2023

Anomaly Detection Algorithm Statistical Independence

Related topics