Anomaly detection on regular values for variables

Rafael_Pachon_Alvare · April 21, 2023, 7:57am

Hi all,

In my job I have faced an issue related to Telecommunications Radio KPI. I was asked to do a regression calculation from KPI values but, in the dataset I was given, there were some records with an unexpected pattern as they looked like be comming from faulty radio equipment.

In fact, I needed to work with packet loss and channel load and it is expected that higher channel load leaded to higher packet loss but I saw sometimes lower or medium channel load leaded to higher packet loss and this is due to hardware errors.

The problem is that both packet loss and channel load are percentages between 0 and 100% so there is no outrageous values such as -10% or 120%.

It is also needed to be taken into consideration that values around 10% or 90% are not outliers. All value between 0% and 100% are expected values.

The only anomalous values are those comming from the combination of low-medium channel load and high packet loss.

How could I proceed with this problem?

Regards

TMosh · April 21, 2023, 6:03pm

What task were you asked to do?
Are you trying to create a model of the typical performance?
Or are you trying to identify the faulty equipment so it can be repaired?

Rafael_Pachon_Alvare · April 24, 2023, 8:59am

In fact, I would like to identify anomalous samples to discard them and then have a “clean” dataset to train a model.

TMosh · April 24, 2023, 5:30pm

This course doesn’t discuss “data cleaning” at all. It’s a difficult topic all its own.

One approach you might take is to use the anomaly detection method that is discussed in MLS Course 3, and use that to identify the examples you want to remove.

But you have to be careful that you don’t remove examples which are important.

If you create a model that only uses a cleaned data set, that model might not work very well to make predictions on new data which might include anomalies.

Christian_Simonis · April 24, 2023, 6:13pm

I would probably start with a visualization followed by a statistical analysis. Question would be if it is possible to describe e.g. with two features (or more?) this characteristic anomaly pattern that you described as a combination of:

low-medium channel load
and high packet loss.

If so, you could fit e.g. a Gaussian mixture model in the next step and evaluate the model capabilities within your feature space with relevant metrics and a residual analysis.

I would expect that after a fist visualization you could judge if your features are sufficient to solve your business problem in an acceptable way. If not following the CRISP-DM methodology in an iterative way and e.g. enhancing your features might be a good option.

Feel free to share a visualization if you like!

Best regards
Christian

Christian_Simonis · April 24, 2023, 6:25pm

This is also a possibility! I understand that in this case you only want to use „normal data“ to train your anomaly detection model in an unsupervised way. This approach was described here:

For example a popular approach is that you can learn your normal behaviour as „normal cluster“ and if a certain data point is too far away from this cluster conclude it is an anomaly.
Autoencoders for example are a popular choice for anomaly detection or you have a sufficient amount of normal labels and the problem is suiting.

This thread might be worth a look, too: Anomaly Detection with Different Probability Distributions - #5 by Christian_Simonis

Please let me know if this helps, @Rafael_Pachon_Alvare.

Best regards
Christian

Topic		Replies	Views
Anomaly detection - subpopulations , narrow normal distributions and false positives Unsupervised Learning, Recommenders, Reinforcement week-1	4	349	January 12, 2024
Removing anomalies from training data Unsupervised Learning, Recommenders, Reinforcement week-1	5	675	September 21, 2022
Anomaly Detection Improvement Issues Unsupervised Learning, Recommenders, Reinforcement week-1	12	525	July 9, 2023
C3_W1_Anomaly_Detection Questions Unsupervised Learning, Recommenders, Reinforcement week-1	2	147	June 1, 2024
Anomaly Detection in real life Advanced Learning Algorithms week-4	2	513	May 23, 2023

Anomaly detection on regular values for variables

Related topics