Question on C3_W1_KMeans_Assignment

Venkat_Subramani · May 15, 2024, 8:54am

Hi

I guess I missed the explanation of np.linalg.norm before the first Lab. From google , From Google, this calculates norms for vectors.
Is it calculating the Eucledian distance between two points on graph? Can you pls throw some light. Thanks
np.linalg.norm (X[i] - centroids[j])

In the following graph, the initial points for blue and green cluster centroids are closer to Blue, esp the second iteration. Then how come it finds its home in Green Cluster? Is it possible for 2 or more centroids associate itself with same Cluster?

gent.spah · May 15, 2024, 10:04am

First the centroid clusters are initialized at certain points, then each point is assigned to the nearest cluster centroid, then the centroid is updated to the center of the points it was assigned to and so on repeated…and center of the centroid moves with the iterations!

See this animations as well:

Venkat_Subramani · May 16, 2024, 4:28am

Yes, I understand the movement part of centroid.

The Lab showed the pic posted above whereby the Green Centroid started being closer to Blue Cluster. Based on the math presented by Andrew, it should have centered around Blue Cluster, In fact it goes closer to Blue Cluster and then finds Green Cluster. I am unable to wrap my heads around the math presented int he video with the pic posted above.

Christina_Fan · May 16, 2024, 4:52am

Hi there

I completed the ex 1, however I encountered an error, which does not seem to related to the ex1. I’d appreciate if you could you please assist with this?

Many thanks
Christina

gent.spah · May 16, 2024, 5:18am

If you have 3 centroids then array indexing starts from 0 so number 3 will be at index 2!

Venkat_Subramani · May 16, 2024, 10:34am

Hope my question is clear?

TMosh · May 16, 2024, 4:44pm

Hope my question is clear?

I am not certain.

The confusion in the chart of the clusters in your first post is that the chart shows the trajectory of the centroids, overlaid on the final color-coded assignment of examples to clusters.

Cluster membership changes on each iteration.

To illustrate with a couple of frame-grabs from the animation @gent.spah posted:

Here are the initial centroids and the associated clusters:

I have (badly) drawn a curve around all of the points that are associated with the “yellow” cluster. It should be an ellipse, but I’m drawing with a mouse pointer. Note that half of the points in what will eventually become the “blue” cluster are assigned to “yellow”, because the “blue” cluster is split in half due to where the initial “yellow” centroid is located.

Here are the updated centroids after the first iteration:

Note that the “yellow” centroid has moved to the mean value of all its members.

Here are the new cluster assignments after the 2nd iteration.

Again I have drawn a curve around the members of the “yellow” cluster. Note that many of the points previously in the “yellow” centroid are now in the “blue” centroid.

Christina_Fan · May 17, 2024, 1:37am

Thank you for your swift reply. I can see the expected output, could you help with the error message (which does not seem to related to exercise 1) so I can get the expected output?

Thanks
Christina

TMosh · May 17, 2024, 1:38am

@Christina_Fan, sorry, I was replying to Venkat’s question.

Venkat_Subramani · May 17, 2024, 4:24am

@Christina_Fan
The error typically means one of the two things:

Either you are accessing a scalar variable as an array like:
a = 2
a[0]
Or as @gent.spah posted , accessing an array beyond its range could trigger this error.
for e:g idx array has three items and can be accessed by
idx[0] , idx[1] , idx[2]

If fixing the code to idx[:2] still producing the same error, you may want to print (idx) and see.

Venkat_Subramani · May 17, 2024, 4:29am

Thank you @TMosh
So this means even if the initial centroids are in deep heart of blue cluster,
in successive iterations, three of them will gravitate towards the yellow, cyan and green clusters.

However its math befuddles me two centroids do not share a cluster.

TMosh · May 17, 2024, 5:35am

You’re mis-interpreting what a cluster is. They aren’t labeled in advance. The centroids move so they each locate themselves near a grouping of similar examples.

It’s a statistical process, and it’s entirely possible that in rare situations, you might get very unlucky with the initial clusters, and you don’t get a nice clean solution.

rmwkwok · May 19, 2024, 5:38am

Hello @Venkat_Subramani,

Another way out is to first remember that each point may only be assigned to one cluster, and it is the assigned points who gravitate the centroid.

C1 & C2 may both be close to the blues, but each blue can only be belong to either C1 or C2.

In this case, C1 gets perhaps half of the blues (and a few to no greens), and C2 gets the other half PLUS many more greens. So, the greens in C2 attract C2 to them more than the blues in C2 can keep it!

Cheers,
Raymond

Venkat_Subramani · May 19, 2024, 6:42am

@rmwkwok
Thank you, Raymond
I was missing the point of each point gets only one centroid. Now it all makes sense.

Thanks again.

rmwkwok · May 19, 2024, 6:49am

You’re welcome, @Venkat_Subramani!

Topic		Replies	Views
Course 3_Week-1_Assignment_K-means_error Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	489	October 25, 2022
Compute centroids func C3_W1_KMeans_Assignment the Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	517	July 29, 2022
Week 1 - Practice Lab 1 'error' Unsupervised Learning, Recommenders, Reinforcement week-module-1	19	683	June 5, 2023
C3_W1_L1 getting error in computing find_closest_centroids Unsupervised Learning, Recommenders, Reinforcement week-module-1	1	486	April 28, 2023
Course week 1 lab 1 find_closest_centroids Unsupervised Learning, Recommenders, Reinforcement week-module-1	4	527	October 26, 2022

Question on C3_W1_KMeans_Assignment

Related topics