Normalize or ... in Kmean?

should we normalize data in Kmean ?

I dont think its a necessity to do that just when applying Kmeans, but if downstream you need to use that data as labeled in supervised ML then its good to be normalized.

1 Like

It depends on the situation. For example, if the features in your dataset already share the same scale, then, as Gent mentioned, normalizing them may not be necessary. However, when the features have different scales, it is usually a good idea to normalize them before running K-means. The reason is that K-means relies on the Euclidean distance metric, which tends to favor features with larger scales. Normalizing the features helps balance their scales, thereby enhancing the performance of K-means. Just remember to consider the specific characteristics of your dataset and the problem you’re working on before making a decision about normalization.