Anomaly Detection: Assuming most of your cross validation examples will be y = 0

Hi there,

In the “Developing and evaluating an anomaly detection system” lecture in Week 1, it was discussed that you will have a minimal number of cross validation examples with labels y = 1 (i.e. anomalous).

It was also discussed to assume for the rest of the cross validation examples, assume their labels are y = 0 (i.e. not anomalous).

In the lab, I noticed that labels were given for all the cross validation examples, but that most of them are 0. Is that because of the reasons explained above?

Furthermore, I am wondering why does it not matter if some of the cross validation examples we assume are y = 0, actually turn out to be y = 1 in the real world (i.e. anomalies)? Andrew mentioned to not worry about that assumption if it’s broken, but it wasn’t clearly explained why it would not matter.

Thanks for your help in advance.

Yes.

Because there are not a statistically significant number of anomalies, so labeling some of them incorrectly is not going to pollute the statistics enough to matter.