Course 3 week 1 : initializing K means vs choosing the right K

mehmet_baki_deniz · April 12, 2023, 11:33am

hi,

NG explains methods to choose the right K value and the importance of multiple random initializations of the kmeans to get the optimum clustering(one that gives pure and seemingly logical clusters)

but how do we do both? or which one should I do first? Should I first choose the right cluster value using domain knowledge and then move on to the multiple initialization process?
IT becomes confusing if I want to employ the elbow method for finding the optimum K value

Mujassim_Jamal · April 12, 2023, 12:30pm

hi @mehmet_baki_deniz , That’s a good question!

I would suggest you to first determine the appropriate K values range using domain knowledge or by employing methods like elbow or any other. Once you determined the optimal number of clusters, you can then perform multiple initialization process for each K value to obtain the optimum clustering. Furthermore, try to choose between different optimal K values by comparing and evaluating quality of clusters using suited metrics that gives best clustering among them.

One thing to note is that, it does not provide guarantee that optimum clustering will always give pure clusters and it depends on domain knowledge too how you interpret it as optimum.

Best Regards,
Mujassim

mehmet_baki_deniz · April 12, 2023, 12:44pm

thank you Mujassim
I checked scikitlearn documentation. scikit model has a parameter called " n_init

It seems the model initializes n_init times to find the best result given K. So we can just assign it to 100 as NG advises and then decide which K value to use with an appropriate method.

would you (guys) aggree with this interpreation?

Mujassim_Jamal · April 12, 2023, 1:23pm

You can set to n_init parameter higher if you want to increase the chances of finding better solution and try different range of n_clusters values. You can then use appropriate method for deciding K value. Also, scikit-learn kmeans initializes the centroids for each K value automatically to achieve optimum cluster, you have to just take care of only number of clusters.

Please Feel free to share the results.

Best Regards,
Mujassim

Topic		Replies	Views
C3_W1_Practice Quiz Unsupervised Learning, Recommenders, Reinforcement week-1	1	251	February 4, 2024
How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function Unsupervised Learning, Recommenders, Reinforcement week-1	14	844	November 28, 2022
Understanding K-mean clusters Unsupervised Learning, Recommenders, Reinforcement week-1	4	535	January 7, 2023
Why not sort a dataset and pick initial centroids at spaced intervals? Unsupervised Learning, Recommenders, Reinforcement week-1	2	16	July 12, 2024
K-Means Clustering AI Discussions	2	146	January 21, 2023

Course 3 week 1 : initializing K means vs choosing the right K

Related topics