Anomaly Detection vs Supervised Learning

Hi I was going through my first week of the unsupervised learning course.
I had a doubt regarding when to use anomaly detection and when to use supervised learning for classifying anomalies.
The first point being if there are given labels then we choose supervised learning, however if that is not the case then what are the factors on the basis we choose anomaly detection?
According to the lecture, in anomaly detection we can easily find newer types of anomalies given only positive samples but isn’t that the case for supervised learning as well?
It may be a little stupid question to ask but it would be super helpful if someone could resolve this doubt for me.

Hi @sasukeuzumaki ,
Based on your comments, in short words we can point out the following:

  • Supervised learning excels with labeled data, offering clear classification of both normal and anomalous data points.
  • Anomaly detection works well with unlabeled data, identifying deviations from the learned normal behavior. It can potentially detect entirely new anomalies but might struggle with false positives.
    Keep learning!

The key factor is whether you have a high enough proportion of anomaly (True) examples that training can be effective.

If the dataset is small or skewed such that there are few anomaly examples, then a classifier is likely to get lower cost by just predicting the False condition all the time without much influence by the True examples.

There are no hard limits for what makes a skewed dataset (because the total size of the dataset matters also), but one threshold you could consider is if True is less than 5% of the examples.

For example, if you have 20 examples and only one is True, that’s a 1/20 ratio (5%). This dataset would work very badly in a classifier.

1 Like