Alternative method for Anomaly Detection (Week 1)

Dear Andrew,

I am a great fan of your courses and have followed a lot of them. Anomaly detection is the only item so far, where I felt the method of gaussian MLE estimation you are suggesting is not optimal and they may be a better way.

Here is my suggestion:

Why not using Kernel Density estimates?

Method: We estimate the probability distribution as follows:

P(x) = 1/m * sum_i=1^m( prod_j=1^n( G(x; X_j^i, sigma) ) )
where
G is the one-variable gaussian function shown in the course
X^i are training examples
m is the number of training examples and n is the number of features.

Just like epsilon controls precision and recall, in my above suggestion the parameter sigma will also control precision and recall; high sigma → low precision, high recall and low sigma → high precision, low recall.

Although KDEs is not a proper statistical inference tool, however here we are not after exact distribution of the training data set.

The pros of my suggestion:

  1. Can address correlated features
  2. No need to scale non-Gaussian features to make them Gaussian. This process is already very cumbersome if there are more than 20 features for example.
  3. The course method cannot address mixed distribution (double bell for example), while my suggestion can.

I will be very grateful to receive your feedback Andrew and it will help me gain deeper understanding of Anomaly detection.

Kind regards,
Shankha.

Thanks for your suggestions.

Sorry, Andrew does not monitor the forums.