Hey @Joy_Ong,
Let me ask you this, doesn’t this sound counter-intuitive to you? If we consider the problem that we are solving, we need to find anomalies in a dataset for which we don’t have any labels. If we use an “all-normal” dataset as you mentioned, doesn’t that mean you already have the labels for the dataset, and if so, then why are we solving the problem at all?
However, this is indeed something to ponder! And yes, they will indeed skew the mean and variance, but we need to ask “To what extent?”. If you have a dataset of 307 examples (as in the lab), and let’s say that we have 7 anomalous examples, so the percentage of anomalous examples is \frac{7}{307} * 100 = 2.28\%. Now, such a small percentage of examples won’t have much effect on the mean and variance, and hence, we can model the entire dataset using a gaussian distribution.
However, another question that you might wonder about! What if we have say 50% anomalies in our dataset? Now, the effect on the mean and variance would be serious. In this case, we might have to look into other techniques. I don’t have much experience with such scenarios, but clustering techniques comes to my mind, and they have been discussed in the course as well. What do you think about this scenario?
I guess that answers your first question, now, let’s move on the next one. I guess the answer depends on the definition of the anomalies, i.e., the task in which you need to apply anomaly detection.
For instance, if the task is laid down something like follows. We have n
computers, and each computer completes independent processes, and we need to find out which process was an anomaly and on which computer. In that case, we should train an anomaly detection model for each computer using the meta-data about the processes that are solved by that computer only.
However, if we are considering the computers as entities instead of the processes, i.e., we need to find out which computer is behaving anomalously, given that some of them are solving independent tasks and some of them are involved in solving distributed tasks, then we should train a single anomaly detection model for all computers using the meta-data about the processes that are solved by all the computers.
Let me know if this helps.
Cheers,
Elemento