What does Andrew Ng mean when he says, " But it turns out this algorithm often works fine even that the features are not actually statistically independent."?

Around 1:55

Hi @yusufnzm,

Here is a good definition for statistical independence:

Two events are

independent,statistically independent, orstochastically independent[1] if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

E.g. if you draw a card out of a deck (event A) and put it back afterwards, then draw again (event B), these events A and B are independent since you always draw randomly from the „same deck“.

However if you would not put the card back after A, this would mean that A affected B and both events are not independent after all.

For large data features you can also say: independence means these two features are not correlated. So the Pearson correlation coefficient is strictly speaking 0 or at least close to zero.

Often you learn a lot when visualizing the data and see if they are statistically dependent or not, see also this source where the residuals are evaluated (here you do not want any correlation or let’s say statistical dependency to your features):

What Prof. Ng means is that anomaly detection often works well, even though the features might not be independent from each other. My experience is: Its also possible they features carry mutual information or have a slight correlation. How to tackle this and get rid of redundant information in your features is described in this repo. The benefit is a better ratio of data to feature space which can contribute to better generalisation.

Hope that helps!

Best regards

Christian

Side note for completeness: You cannot make this conclusion the other way around, that if between two features, the correlation = 0, those features would be independent. This would not be true in general.

This does not work since only the linear dependency is assessed but still other kind of dependencies are possible, see also this viz e.g. if you have a circular pattern / dependency:

- https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Correlation_examples2.svg/1200px-Correlation_examples2.svg.png
- Correlation - Wikipedia

I would really recommend to visualise your data! This helps often to find patterns or see dependencies.

Best regards

Christian