How exactly does k means know that a cluster centroid is closer to these set of data sets?

In the kmeans we dont know wether this point will lie close to k_{\text{th}} cluster or not. That is it checks distance for all the points.

In the example shared in the following pic, how it determined which cluster would be closer to the data point and assign its index

is that all points distance is calculated considering all the cluster then argmin is used to determin which cluster is nearest to the data point?

Hello @tbhaxor, if you have 5 centroids, 10 datapoints, then 5*10 distances are needed to determine for each datapoint, which centroid is the nearest. Yes, argmin would be used. You will practice this process in C3 W1 Assignment 1 for K-means.

Cheers,
Raymond

1 Like

Thank you @rmwkwok it solved my doubt

Also I see we can use the clustering as preprocessing of supervised learning. Idk if this is done or found efficient

You are welcome, @tbhaxor!

There are discussions and papers on the internet for clustering as a preprocessing tool! We can see if they make sense.

Cheers,
Raymond

1 Like