Hi, I was watching the first video of Anomaly detection in week 1 of course 3. Suddenly I couldn’t distinguish between the way in which anomaly detection and Classification work when Andrew NG was speaking about the Aircraft example.

Say In Binary Classification we use features to find out whether a test example belongs to a positive class or negative class. And here as well in Anomaly detection, we use features to detect Anomaly.

And another question is, in what sense is Anomaly detection unsupervised ? , since it uses features similar to classification task.

A learning is unsupervised when you don’t give it any training labels. Both supervised and unsupervised learning use features.

In Binary classification, the learning algorithm is given the labels, so it knows which samples are positives and which are negatives. The learning algorithm attempts to draw a boundary in the feature space to best distinguish positive samples from negatives.

In our anomaly detection example, the algorithm does not know which samples are positive and which are negative. We consider most cases are positive and positive cases share similar features’ values, which means they tend to gather around with each other at a certain area in the feature space. In contrary, samples that are far away from the majority are more likely to be different (anomalous).

Given that this idea is about where in the feature space samples are more likely to gather around, this is a frequency problem, or a probablistic problem. When samples gather in a certain area in the space, you have a high probability to find samples in that area.

So, we apply guassian distributions to model the probability of samples showing up at each part of the whole feature space, where the mean parameters of the distributions should indicate the most crowded spot. Up to here, we already learned the model parameters (the means and the variance) with our data (features) without knowing in prior the labels.

Given the learnt parameters, we can tell at each point in the feature space, the chance of seeing a sample. If a sample falls in the area of high probability, it is seen as normal. If it falls in somewhere the gaussian model thinks it is very unlikely, then it is seen as anomalous.