Hi @yusufnzm,
Here is a good definition for statistical independence:
Two events are independent , statistically independent , or stochastically independent [1] if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.
E.g. if you draw a card out of a deck (event A) and put it back afterwards, then draw again (event B), these events A and B are independent since you always draw randomly from the „same deck“.
However if you would not put the card back after A, this would mean that A affected B and both events are not independent after all.
For large data features you can also say: independence means these two features are not correlated. So the Pearson correlation coefficient is strictly speaking 0 or at least close to zero.
Often you learn a lot when visualizing the data and see if they are statistically dependent or not, see also this source where the residuals are evaluated (here you do not want any correlation or let’s say statistical dependency to your features):
What Prof. Ng means is that anomaly detection often works well, even though the features might not be independent from each other. My experience is: Its also possible they features carry mutual information or have a slight correlation. How to tackle this and get rid of redundant information in your features is described in this repo. The benefit is a better ratio of data to feature space which can contribute to better generalisation.
Hope that helps!
Best regards
Christian