How to evaluate-visualize clusters derived through PCA

hi,

the title might not be the best to adress my question. Here is my problem

I have a data set with 21 features. and I want to cluster the data to interpret if there are any insights that I can have by clustering the data.

I started the process with PCA and reduced it to 2 component and then trained k-means clustering model with 4 central points.

When I visualize it, my clusters look nice and tidy based on 2 components. The problem that I can not solve is how I can go from those two components back to the 21 features in my original set.

That is important since I need to seee which of those features are important and make an analysis through those features instead of those 2 components.

Mehmet

1 Like

Hi @mehmet_baki_deniz

since you have conducted a compression (and probably not all variance were explained by that 2 features) you lost some information and cannot exactly reconstruct the original features but of course you can transform your data back to the original space: Feel free to take a look at inverse_transform() at sklearn.decomposition.PCA — scikit-learn 1.3.2 documentation

Also in this thread a PCA reconstruction was discussed, see also this repo w/ mnist dataset.

You could also check the cumulative variance which is explained by:

  • PC 1
  • PC 2

see also this repo. I would expect that PC1 has better clustering capabilities than PC2 and the residual information gain per PC would decrease if you would use more features.

Hint: did you already conduct an elbow analysis or did you calculate a silhouette score of your clustering problem, see also this blog post?

Feel free to add a plot and also some more context regarding the problem you are solving.

Best regards
Christian

1 Like