Week 2 Vectorization and Why Num.py is fast

In the first video on vectorization, Prof Ng takes an example where multiplying two million size arrays takes around 450ms using for loops and 1.5ms using numpy dot operation.

But, this is running on CPU and not GPU so such speed improvement of 300x is not possible just because of parallelism or SIMD.

I think that on single core CPU, numpy dot is faster than python for-loop not because of vectorisation / SIMD but because many of these operations are implemented in C language.

Also, I tried writing a normal C code to multiply million size arrays using C for-loops, it took only a few milliseconds on a single core.

I had the same question and it seems like python is just significantly slower at numerical calculations than C and it’s recommended to use numpy for calculations. I looked here: performance - Why is Python so slow at numeric calculations? - Stack Overflow and here: performance - Why is vectorization, faster in general, than loops? - Stack Overflow

Please someone correct me if I’m wrong.

Exact, Python is a performance disaster.

There are several points to consider here. The first is that python is an interpreted language, versus c which is a compiled language. But in practice with modern JIT compilation technology, that difference ends up not being that significant for cases in which the code is run multiple times. The interpreter used the JIT (Just In Time) compiler to generate compiled code for the python routines that get executed frequently.

The other important point is that any modern CPU has vectorized instructions, even without a separate GPU. E.g. here’s the Wikipedia page for the standard Intel vector instructions. Even in a compiled language like c, the vector instructions of the CPU still make a big difference. If you try your million element array computations with loops in c and then use a real vectorized linear algebra library to execute the equivalent code in a vectorized way, you’ll find there is still a big difference. Hardware vector instructions make a very significant difference when you’re dealing with reasonable sized objects.