Like we have a data [5,3] and centroids [8,6] and [1,3]. How to calculate which one is closer? || xi - mk || ^2 what does it do? May anyone explain pls

Hi!

In order to figure out the distance between the data points and the centroids we calculate something called the l^2 norm on the difference of the data point and the centroid.

The l^2 norm is defined as \sqrt{x_1^2 + x_2^2} for a two dimensional vector.

In the example where data is [5,3] and centroids [8,6], [1,3].

We calculate a new vector that is the difference between the data point and the centroids.

And perform the l^2 norm on it.

for the distance between [5,3] and [8,6]:

\sqrt{(x_i - \mu_k)_1^2 + (xi - \mu_k)_2^2} => \sqrt{(5-8)^2 + (3 - 6)^2} = \sqrt{9 + 9} = \sqrt{18} \approx 4.24

for the distance between [5, 3] and [1, 3]:

\sqrt{(5 - 1)^2 + (3-3)^2} = \sqrt{16} = 4

we then take the minimum between 4.24 and 4 and say that [5,3] is closer to [1,3] and should belong to that cluster.

If this doesn’t quite make sense take a look at this explanation here:

Thank you so much, finally figured it out