In python basics with numpy, the vectorization does not seem to bring much in your example in terms of computing time because the matrices are too small (as you point out).
What you could do is to generate huge matrices with 10^5-10^6 elements or so to show the real power of vectorization.
On small data sets, the speed difference is indeed not meaningful. That means small datasets are great for practicing using vectorization. So if I have a calculation where I use a for loop maybe I can rewrite it using one of the numpy operations.
When I’m playing around with a code, I might first write the answer using a for loop, get it to work and then write a vectorized version of the calculation. Having two versions lets me compare the two answers which will be same if my code is correct. I can also time the two versions and try using a larger data set for a comparison.
I will then erase the for loop version from the final version and just keep the vectorized version.
PS. One can add %%time to the top of a notebook cell to get the amount of time that was spent doing a calculation. Please remember to remove it before submitting the assignment.
For me, the importance of vectorization goes beyond the speedups from loop elimination.
I view it as a change in thinking to the “big operations on big (multi-axis) values”. Those big values will have 3, 4 or more axes in the later courses.
That’s a good point, @GordonRobinson. Another positive side effect is that the vectorized version is typically less code and simpler code. And it runs faster. What’s not to like about that?
the important idea around numpy and vectorization is the possibility to use the SIMD instructions that are available in the different modern processor (with differences depending on the architecture).
Couple of links:
Yes, @crisrise makes a really important point here. When you call a vectorized numpy routine, what is actually happening is that it in turn is calling a lower level assembly language routine that uses special CPU instructions that are specifically designed to make vector computations efficient. It’s not that there’s another “for” loop buried in the library that is somehow a “better” for loop than you could write in python. It’s literally different CPU instructions specifically designed to make this type of operation efficient.
Yes, I know about SIMD, on CPUs and GPus. My only point was to suggest not to show the improvement with examples where there is no improvement. Just go for large vector operations if you want to illustrate it. As it is now, the tutorial fails to show the gain of vectorizing. Just my 2 cts.
Thanks @Colin I understand the point you are making and I totally agree. I think experiencing the speed advantage in the practical case is a very good learning experience.