C3_W1_ClusteringLab_DistanceVectorization

jacasta2 · November 10, 2022, 11:38pm

Hello,

In the function find_closest_centroids, I wonder whether it’s possible to vectorize the computation of the distance between an example and all the centroids.

In my code, I implement a loop to iterate through all examples. Within this outer loop, I implement an inner loop that, for each example, iterates through all centroids to compute the distance between the example and each of the centroids. It looks more or less like the following in pseudocode:

examples_loop:
   centroids_loop:
      distance_i,j = ||example_i - centroid_j||**2

I wonder whether it’s possible to get rid of the inner loop and compute the distance between an example and all the centroids with a single line of code using vectorization. Something like the following:

examples_loop:
   distances_i = ||example_i - centroids||**2

Any hint is appreciated.

Thanks!

AbdElRhaman_Fakhry · November 11, 2022, 12:39am

Hi @jacasta2

of course you can do it I use it with one for loop over number of clusters(=3) …you can use dist […]=norm(…,axis=1) for determine you work on columns and after that you can you idx= np.argmin(…,axis=1) to determine the index for which has the lowest distance

please feel free to ask any question,
Thanks,
Abdelrahman

rmwkwok · November 11, 2022, 7:25am

Hello Jaime! @jacasta2,

In fact you don’t need any loop if you want the vectorized way. I can’t share any assignment related answer, but to give you some idea, say, you have an array called example of shape (m, n) and an array called centroids of shape (N, n), and the key is to make use of numpy’s broadcasting feature such that when you subtract the modified version of the two, the outcome will become (m, N, n). This outcome speaks the difference of sample m from centroid N in the n-th dimension.

If you are not familiar with broadcasting, please read this documentation. As you read it, you will realize that you need to change at least one of the variable’s shape to make broadcasting possible. Please experiment it because broadcasting is a great technique to master.

Cheers,
Raymond

Topic		Replies	Views
Vectorization in exercise 2, C3 w1 assignment: k-means Unsupervised Learning, Recommenders, Reinforcement week-1	3	507	September 17, 2022
Problems with compute centroids (exercise 2, week 1) Unsupervised Learning, Recommenders, Reinforcement week-1	2	346	September 6, 2023
C3_W1_KMeans_Assignment Unsupervised Learning, Recommenders, Reinforcement week-1	2	537	August 29, 2022
Course_3 week-1 Error in find_closest_centroids Unsupervised Learning, Recommenders, Reinforcement week-1	3	526	November 16, 2022
Question regarding C3_W1_KMeans_Assignment: Unsupervised Learning, Recommenders, Reinforcement week-1	4	491	April 24, 2023

C3_W1_ClusteringLab_DistanceVectorization

Related topics