Hi
I am working on an anomaly detection where we need to detect outliers on one specific KPI. Hence, the model itself is not complicated. On the other hand, the data on which it runs is very complicated. It is telecommunications data. We monitore calls from different carriers , locations etc. The distributions of the specific KPI is very different between carriers, locations, call directions etc. That is, what seems like an outlier for carrier 1 is very normal for carrier 2. Hence, not only the mean but also the std of the data in question varies a lot between carriers , locations etc.
My idea was to standardise within each group . For example, one can create the Z score by using the mean and std for the specific carrier , location etc.
However, here a different problem arises . In some cases the distributions are very narrow and some KPIs are wrongly marked as outliers although they belong into a normal range. For example , take a distribution with mean 60 , min 45 , max 70. I am only interested in fluctuations below the mean. In this specific distribution the waste majority of the cases are around 60 and only few cases are around 45. However, this does not make 45 an outlier in the practical sense.
This brings me to the more general question. If one trains data with distributions that actually do not include any outliers. How can one eliminate the waste majority of false positives (false alarms/outliers) detected by the model? does heavily left skewed distributions not only allow for proper outlier detection?
to summaries my two questions.
how would you recommend to normalise the data that come from different populations (e.g. carrier)?
how to deal with distributions that actually do not contain outliers in a practical sense (are not left skewed enough, or too narrow)?
and just for curiosity , I actually ask myself why anomaly detection that is based on probability distributions and densities is part of machine learning. I don’t see where the model “learns” . Rather , I am looking at samples of populations and setting a cutoff based on some deviations. This is especially true, when the training data does not contain any outliers. Where does the “learning” actually take place ?
Many thanks
Victoria