Understanding K-mean clusters

Basit_Kareem · January 7, 2023, 4:56pm

I have been having some misconceptions about the size of K clusters. To start with, I created a set of random inputs: x1 and x2
Screenshot (190)

I made a scatterplot of these inputs, as shown below
Screenshot (179)

From the figure above, it is safe to assume the number of clusters K = 3.

I then initialize my clusters using the following function:
Screenshot (187)

The plot shows the following result:
Screenshot (186)

And after several randomizations, just like Professor Andrew advised, I got this impressive plot:
Screenshot (188)

The professor said the number of columns of Mu_k would have to be the same as the number of features in our training sample. In my implementation here, I created two features and the length of my Mu_k = 2. Is that the proper implementation of the professor’s advice?

Can I go on to implement the complete algorithm?

AbdElRhaman_Fakhry · January 7, 2023, 6:11pm

HI @Basit_Kareem

That’s incorrect implementation to get the best centroids points that make the minimum between assigned points and the cluster that it assigned to, also you had been chosen the initial centroids correctly but you didn’t make a list of the centroids points to can update it’s points using k-means algorithm or any other cluster algorithm
The 5 Steps to create K-means Clustering model :

Step 1. Randomly pick k data points as our initial Centroids. like what you did
Step 2. Find the distance (Euclidean distance for our purpose) between each data points in our training set with the k centroids.
Step 3. Now assign each data point to the closest centroid according to the distance found.
Step 4. Update centroid location by taking the average of the points in each cluster group.
Step 5. Repeat the Steps 2 to 4 till our centroids don’t change.
in the course 3 of the specialization there are an assignment about how to create K-means Clustering model from scratch and also I think this ink can help you Create a K-Means Clustering Algorithm from Scratch in Python | by Turner Luke | Towards Data Science

Thanks!
Abdelrahman

Basit_Kareem · January 7, 2023, 7:36pm

Yes. I understand. That was why I said

Since I was only trying to see what’s going on, I had to manually iterate over different random picks.

I understand that in actual implementation, I will have to follow the algorithm as you listed.

Thanks for your contribution

AbdElRhaman_Fakhry · January 7, 2023, 7:37pm

You are welcome, and also feel free to ask any questions
Cheers!
Abdelrahman

Basit_Kareem · January 7, 2023, 7:59pm

Thanks.

I will try to do the full implementation and update the result here

Topic		Replies	Views
Initializing Cluster centroids for 3D arrays Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	504	January 14, 2023
Module 1 -> Clustering -> k-means optimization objective Supervised ML: Regression and Classification week-module-1	7	38	July 2, 2025
Course week 1 lab 1 find_closest_centroids Unsupervised Learning, Recommenders, Reinforcement week-module-1	4	527	October 26, 2022
Course 3 week 1 : initializing K means vs choosing the right K Unsupervised Learning, Recommenders, Reinforcement week-module-1	3	480	April 12, 2023
How different initialization of centroids of K-means results in drastic different clusters ? They all share common cost function Unsupervised Learning, Recommenders, Reinforcement week-module-1	14	853	November 28, 2022

Understanding K-mean clusters

Related topics