Vectorization in exercise 2, C3 w1 assignment: k-means

Hello everybody,

Yesterday I was completed the assignment for the k-means algorithm, but after seeing that it tooked so long to complete (section 4.2, it even says in the code this takes a couple of minutes), I tried to implement a vectorized way of computing the closest centroids in exercise 1 and succeeded. It now takes milliseconds.

Nevertheless, there’s one step in the next function that I can’t figure out how to avoid: in the compute_centroids() function:
Is there a way to avoid having to do for i in range(K):?
All the ideas that come up to my mind include a numpy array formed from arrays of differents dimensions, which is something that numpy doesn’t support.

I have the code and the approaches that I’ve tried, but I don’t post them here because it’d be against the rules.

Thank you in advance, best regards,
Manuel

My below test result is 365 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each). What about yours?

%timeit compute_centroids_test(compute_centroids)

Hi Raymond,

My results are:
333 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Does that mean that there’s no “better implementation”? I guess I lack the expertise to know when something can’t be optimized :sweat_smile:.

Thank you for your fast reponse.

Hello Manuel,

It means that my approach isn’t better than yours =).
By the way, nice try Manuel! Let us know if you run out of ideas how to improve other functions.

Raymond