C3_W1_ClusteringLab_DistanceVectorization

Hello,

In the function find_closest_centroids, I wonder whether it’s possible to vectorize the computation of the distance between an example and all the centroids.

In my code, I implement a loop to iterate through all examples. Within this outer loop, I implement an inner loop that, for each example, iterates through all centroids to compute the distance between the example and each of the centroids. It looks more or less like the following in pseudocode:

examples_loop:
   centroids_loop:
      distance_i,j = ||example_i - centroid_j||**2

I wonder whether it’s possible to get rid of the inner loop and compute the distance between an example and all the centroids with a single line of code using vectorization. Something like the following:

examples_loop:
   distances_i = ||example_i - centroids||**2

Any hint is appreciated.

Thanks!

Hi @jacasta2

of course you can do it I use it with one for loop over number of clusters(=3) …you can use dist […]=norm(…,axis=1) for determine you work on columns and after that you can you idx= np.argmin(…,axis=1) to determine the index for which has the lowest distance

please feel free to ask any question,
Thanks,
Abdelrahman

1 Like

Hello Jaime! @jacasta2,

In fact you don’t need any loop if you want the vectorized way. I can’t share any assignment related answer, but to give you some idea, say, you have an array called example of shape (m, n) and an array called centroids of shape (N, n), and the key is to make use of numpy’s broadcasting feature such that when you subtract the modified version of the two, the outcome will become (m, N, n). This outcome speaks the difference of sample m from centroid N in the n-th dimension.

If you are not familiar with broadcasting, please read this documentation. As you read it, you will realize that you need to change at least one of the variable’s shape to make broadcasting possible. Please experiment it because broadcasting is a great technique to master.

Cheers,
Raymond

1 Like