C3_W1_Anomaly_Detection with 2 or more clusters

I am attempting to apply the concepts we have learned to two or more clusters generated using ‘make_blobs’ (as demonstrated in the example here: Comparing anomaly detection algorithms for outlier detection on toy datasets — scikit-learn 1.5.0 documentation).

Using the first dataset in the example above, I get comparable resulsts as shown below

But the code doesnt do a good job when multiple bloobs or clusters are presents as it assumes we are dealing with a single cluster.Could you provide general advice or an industry approach for customizing our code to detect anomalies when dealing with multiple clusters, assuming we are not using any scikit-learn anomaly detection algorithms?

Hi @francktchafa

To detect anomalies in datasets with multiple clusters, first implement a clustering algorithm like K-means or DBSCAN to identify different clusters. Then, for each cluster, calculate its centroid and spread. Use distance metrics such as the Mahalanobis distance to determine the distance of each point from the cluster centroids, and flag points with distances exceeding a certain threshold as anomalies.

Hope this helps! Keep in mind that there are other methods for these kinds of tasks, too!

1 Like

Thanks for this clarification.

You’re welcome! happy to help :raised_hands: