Like we have a data [5,3] and centroids [8,6] and [1,3]. How to calculate which one is closer? || xi - mk || ^2 what does it do? May anyone explain pls
Hi!
In order to figure out the distance between the data points and the centroids we calculate something called the l^2 norm on the difference of the data point and the centroid.
The l^2 norm is defined as \sqrt{x_1^2 + x_2^2} for a two dimensional vector.
In the example where data is [5,3] and centroids [8,6], [1,3].
We calculate a new vector that is the difference between the data point and the centroids.
And perform the l^2 norm on it.
for the distance between [5,3] and [8,6]:
\sqrt{(x_i - \mu_k)_1^2 + (xi - \mu_k)_2^2} => \sqrt{(5-8)^2 + (3 - 6)^2} = \sqrt{9 + 9} = \sqrt{18} \approx 4.24
for the distance between [5, 3] and [1, 3]:
\sqrt{(5 - 1)^2 + (3-3)^2} = \sqrt{16} = 4
we then take the minimum between 4.24 and 4 and say that [5,3] is closer to [1,3] and should belong to that cluster.
If this doesn’t quite make sense take a look at this explanation here:
Thank you so much, finally figured it out