I am quite interested in applying anomaly detection to a real use case, but there is one thing I don’t quite understand.
Let us go with the airplane engine example looking at the feature “frequency of vibration”. The vibrations will probably form some sort of probability density function that contains frequencies based on the speed of the airplane.
As the pilot speeds up the airplane the frequency increases too. At high enough, or low enough speeds, the “frequency measurements” will deviate enough from the mean that it is flagged as an anomaly.
Q: How is it possible to differentiate between anomalies caused by varying speed and anomalies caused by the motor actually being broken?
Let me know if I need to clarify anything about my question
I dont know much about the dynamics of a plane, but lets draw a parallel with a car.
The car has an allowed speed range and an rpm range. An RPM or speed of 0 is within the allowed range - The corresponding vibration and heat generated would thus fall within the allowed range.
A high speed or high rpm would also be within the allowed operation range. Granted that with an increase in speed/rpm the heat and vibration would increase. But these values of heat and vibration should still fall within the allowed range, even though it is on the higher side.
Tha anomaly would happen if we are within the operable range in terms of speed/rpm and the heat or vibration went into a rare territory - This would be our low probability event
Going back to your comment about the mean - Deviating from the mean, but being within 1 std or sometimes even within 2 std might not still be considered an anomaly. Depending on the domain of application, we decide on how far from the mean or how many standard deviations from the mean would constitute an anomaly.
So if just one of the parameters falls “out of range” from what is considered normal, we flag it as an anomaly.
I’m guessing the key here is to have enough parameters, and then “separate” the model into sections. For instance, with the car, have one model from 0-10 km/h and another from 10-20km/h?
If you go back to the videos, you can see that we use the joint probability to calculate the probability of such an event happening as follows:
p(x) = \prod_{j=1}^{n} p(x_j)
So, we don’t just look at 1 variable in isolation
This is a good suggestion, although quite an arduous task to capture the data in the first place. If we have to look at 0-10 kmph and capture all the values of the other parameters at this speed, and then do the same for 10-20 kmph and so on and so forth for every category of speed, it is not going to be an easy task. But then it could be argued that not doing so could lead to false negatives - What if the speed is only 9 kmph but the heat or virbation generated is that which should happen only at 20 kmph? Valid Point. The data distributions that we discuss in the videos do not have this kind of bifurcation.