Question on C3_W1_KMeans_Assignment

Hi

I guess I missed the explanation of np.linalg.norm before the first Lab. From google , From Google, this calculates norms for vectors.
Is it calculating the Eucledian distance between two points on graph? Can you pls throw some light. Thanks
np.linalg.norm (X[i] - centroids[j])

In the following graph, the initial points for blue and green cluster centroids are closer to Blue, esp the second iteration. Then how come it finds its home in Green Cluster? Is it possible for 2 or more centroids associate itself with same Cluster?

First the centroid clusters are initialized at certain points, then each point is assigned to the nearest cluster centroid, then the centroid is updated to the center of the points it was assigned to and so on repeated…and center of the centroid moves with the iterations!

See this animations as well:

Yes, I understand the movement part of centroid.

The Lab showed the pic posted above whereby the Green Centroid started being closer to Blue Cluster. Based on the math presented by Andrew, it should have centered around Blue Cluster, In fact it goes closer to Blue Cluster and then finds Green Cluster. I am unable to wrap my heads around the math presented int he video with the pic posted above.

Hi there

I completed the ex 1, however I encountered an error, which does not seem to related to the ex1. I’d appreciate if you could you please assist with this?

Many thanks
Christina

If you have 3 centroids then array indexing starts from 0 so number 3 will be at index 2!

Hope my question is clear?

Hope my question is clear?

I am not certain.

The confusion in the chart of the clusters in your first post is that the chart shows the trajectory of the centroids, overlaid on the final color-coded assignment of examples to clusters.

Cluster membership changes on each iteration.

To illustrate with a couple of frame-grabs from the animation @gent.spah posted:

Here are the initial centroids and the associated clusters:
image

I have (badly) drawn a curve around all of the points that are associated with the “yellow” cluster. It should be an ellipse, but I’m drawing with a mouse pointer. Note that half of the points in what will eventually become the “blue” cluster are assigned to “yellow”, because the “blue” cluster is split in half due to where the initial “yellow” centroid is located.

Here are the updated centroids after the first iteration:

Note that the “yellow” centroid has moved to the mean value of all its members.

Here are the new cluster assignments after the 2nd iteration.

Again I have drawn a curve around the members of the “yellow” cluster. Note that many of the points previously in the “yellow” centroid are now in the “blue” centroid.

1 Like

Thank you for your swift reply. I can see the expected output, could you help with the error message (which does not seem to related to exercise 1) so I can get the expected output?

Thanks
Christina

1 Like

@Christina_Fan, sorry, I was replying to Venkat’s question.

@Christina_Fan
The error typically means one of the two things:

  1. Either you are accessing a scalar variable as an array like:
    a = 2
    a[0]

  2. Or as @gent.spah posted , accessing an array beyond its range could trigger this error.
    for e:g idx array has three items and can be accessed by
    idx[0] , idx[1] , idx[2]

    If fixing the code to idx[:2] still producing the same error, you may want to print (idx) and see.

1 Like

Thank you @TMosh
So this means even if the initial centroids are in deep heart of blue cluster,
in successive iterations, three of them will gravitate towards the yellow, cyan and green clusters.

However its math befuddles me two centroids do not share a cluster.

You’re mis-interpreting what a cluster is. They aren’t labeled in advance. The centroids move so they each locate themselves near a grouping of similar examples.

It’s a statistical process, and it’s entirely possible that in rare situations, you might get very unlucky with the initial clusters, and you don’t get a nice clean solution.

2 Likes

Hello @Venkat_Subramani,

Another way out is to first remember that each point may only be assigned to one cluster, and it is the assigned points who gravitate the centroid.

image

C1 & C2 may both be close to the blues, but each blue can only be belong to either C1 or C2.

In this case, C1 gets perhaps half of the blues (and a few to no greens), and C2 gets the other half PLUS many more greens. So, the greens in C2 attract C2 to them more than the blues in C2 can keep it!

Cheers,
Raymond

2 Likes

@rmwkwok
Thank you, Raymond
I was missing the point of each point gets only one centroid. Now it all makes sense.

Thanks again.

1 Like

You’re welcome, @Venkat_Subramani!