Anomaly Detection: Assuming most of your cross validation examples will be y = 0

DeepInData · December 28, 2024, 12:50am

Hi there,

In the “Developing and evaluating an anomaly detection system” lecture in Week 1, it was discussed that you will have a minimal number of cross validation examples with labels y = 1 (i.e. anomalous).

It was also discussed to assume for the rest of the cross validation examples, assume their labels are y = 0 (i.e. not anomalous).

In the lab, I noticed that labels were given for all the cross validation examples, but that most of them are 0. Is that because of the reasons explained above?

Furthermore, I am wondering why does it not matter if some of the cross validation examples we assume are y = 0, actually turn out to be y = 1 in the real world (i.e. anomalies)? Andrew mentioned to not worry about that assumption if it’s broken, but it wasn’t clearly explained why it would not matter.

Thanks for your help in advance.

TMosh · December 28, 2024, 1:07am

Yes.

Because there are not a statistically significant number of anomalies, so labeling some of them incorrectly is not going to pollute the statistics enough to matter.

Topic		Replies	Views
C3_W1_Anomaly_Detection - exercise 2 question Unsupervised Learning, Recommenders, Reinforcement week-1	1	548	June 20, 2023
Anomaly algorithm - video difference Unsupervised Learning, Recommenders, Reinforcement week-1	6	28	July 10, 2024
Finding unusual events example: Why unlabeled data Unsupervised Learning, Recommenders, Reinforcement week-1	1	431	July 13, 2023
Anomaly Detection vs Supervised Learning Unsupervised Learning, Recommenders, Reinforcement week-1	2	392	May 15, 2024
Anomaly Detection Improvement Issues Unsupervised Learning, Recommenders, Reinforcement week-1	12	525	July 9, 2023

Anomaly Detection: Assuming most of your cross validation examples will be y = 0

Related topics