Anomaly algorithm - video difference

gmazzaglia · July 8, 2024, 12:35am

Hello, I’m writing this email because I’ve noticed a discrepancy, or at least I understood as it, between the explanations in 2 videos:
Video 1: Developing and evaluating anomaly detection
Video 2: Anomaly detection vs supervised learning.

Video 1 says you should choose a dataset to validate that the algorithm performs correctly, and divide them among Training Set, Cv, and Test. It indicates “good engine” is y = 0 and “anomaly engine” is y = 1.

But then Video 2, indicates that you should choose when you have more negative examples than positives.

I would like to understand it deeper because it’s confusing when you listen to one video and the other one.

Thanks.
Regards.
Gus

Alireza_Saei · July 8, 2024, 9:05am

Hi @gmazzaglia

The key difference between the two videos lies in the context of the datasets they refer to:

Video 1 explains the process of developing and evaluating an anomaly detection algorithm by splitting the dataset into training, cross-validation, and test sets, labeling “good engine” as y = 0 and “anomaly engine” as y = 1 .
Video 2 highlights that anomaly detection is typically used when there are many more negative examples than positive examples (anomalies, ( y = 1 )). This imbalance is a common scenario in anomaly detection problems.

Hope it helps!

gmazzaglia · July 8, 2024, 9:05pm

Hi @Alireza_Saei , thanks for your reply, please, Can you explain it more deeply?
I mean, based on what I’ve learned, the anomaly case occurs because is out of the normal / Gaussian distribution that the media and standard deviation indicate that everything is ok, so something out from there is an anomaly.
If I have a lot of negative examples, how can I detect an anomaly? they will belong to a normal distribution.

Thanks.
Regards.
Gus

Alireza_Saei · July 9, 2024, 9:58am

Hi @gmazzaglia,

That’s a good question! Anomaly detection is primarily used in scenarios where anomalies are rare compared to normal instances, which is why they are called anomalies.

Having more negative examples (normal cases) is actually beneficial because it helps you accurately model what normal looks like. The more data you have, the better you can understand the distribution of your normal data, leading to more accurate anomaly detection.

However, if anomalies make up a significant portion of your dataset, it shifts the problem from traditional anomaly detection to a classification problem. In such cases, you would use supervised learning techniques to classify the data into normal and anomalous categories.

Hope this helps! Feel free to ask if you need further assistance.

TMosh · July 9, 2024, 10:45am

Negative examples are by definition not anomalies.

gmazzaglia · July 10, 2024, 12:44pm

Thanks, @Alireza_Saei, I understood.

Regards.
Gus

Alireza_Saei · July 10, 2024, 4:52pm

You’re welcome, happy to help

Topic		Replies	Views
Anomaly Detection vs Supervised Learning Unsupervised Learning, Recommenders, Reinforcement week-1	2	354	May 15, 2024
Difference between Anomaly detection and classification Unsupervised Learning, Recommenders, Reinforcement week-1	3	581	July 28, 2022
Anomaly Detection Algorithm Unsupervised Learning, Recommenders, Reinforcement week-1	1	501	August 31, 2022
Many outliers vs real data Unsupervised Learning, Recommenders, Reinforcement week-1	2	430	June 7, 2023
Anomaly detection using Gaussian vs isolation forest/SVM Unsupervised Learning, Recommenders, Reinforcement week-1	1	512	August 11, 2022

Anomaly algorithm - video difference

Related topics