Vectorization & parallel processing, where to learn more?

does anyone know a good place to learn more about how numpy vectorization (& python execution more generally…) works under the hood?

do we only see performance benefits when the vectorized code is run on a GPU?

Hello @martis880! it is also my first time trying to look for info about your question, and I find this article pretty interesting because it experimented using numpy (which is written in C) with generic Python-way to do the same thing. The comparison isn’t in terms of time but number of CPU instructions. It also compared how newer version of numpy made use of CPU’s feature (SIMD) to further speed up. Hope you will find it interesting too!

Raymond has given you great info to learn more. One other point to make is that modern CPUs have their own vector hardware, even if you don’t have a separate GPU. So it is always worth considering vectorization. Here’s the Wikipedia page for Intel’s standard vector instruction set.

Thanks this was helpful! This stackoverflow post also helped me performance - Why is vectorization, faster in general, than loops? - Stack Overflow

This is probably getting a beyond the scope of the course, but I’m still confused by Python vs CPython vs Numpy.

In the slow loop approach, you write Python code, which gets compiled by CPython into python byte code, and then executed by CPython?

When you use Numpy, are you somehow bypassing CPython? Is it only getting bypassed for the Numpy specific operations? How does the program know to use the compiled C code from Numpy but still use the python interpreter for the non-Numpy stuff? These questions might not make sense…

No problem, I am interested in that as well, though I don’t have much experience in that if you don’t mind :slight_smile: But I will try to support my answer with other sources.

Yes, and as inferred from here, CPython is the default implementation of Python, and if we read this, we see that PyPy is another implementation. Our CPython implementation is both a interpreter & compiler.

I think the idea is, all your Python code goes through CPython first, until it reaches the very line of code that instruct CPython to go to the compiled C code. The key is, we make sure all heavy-lifting task and related memory mangement is on C code so the involvement of native Python is minimal. We can take a look at this stackoverflow answer which brings us to discover this official documentation on how to build an extension for Python in C language, in which it will also ask us to compile our own C module and link it with the Python system, and I guess this is why when CPython sees that very line of calling an external module, it will be able to locate it and execute it. I think the aforementioned documentation gives a more concrete example of how to write an extension yourself, so you can try to call your C module from Python.

Last piece of reference from numpy documentation,:

For many types of computing needs, the extra slow-down and memory consumption can often not be spared … Therefore one of the most common needs is to call out from Python code to a fast, machine-code routine (e.g. compiled using C/C++ or Fortran). The fact that this is relatively easy to do is a big reason why Python is such an excellent high-level language for scientific and engineering programming.

Their are two basic approaches to calling compiled code: writing an extension module that is then imported to Python using the import command, or calling a shared-library subroutine directly from Python using the ctypes module. Writing an extension module is the most common method.

And they also discuss how to write an extension module, and finally point back to the python documentation.

When an extension module is written, compiled, and installed to somewhere in the Python path (sys.path), the code can then be imported into Python as if it were a standard python file. It will contain objects and methods that have been defined and compiled in C code. The basic steps for doing this in Python are well-documented and you can find more information in the documentation for Python itself available online at www.python.org .

Mind that all my quotes are just quotes - not a complete full picture, especially for the quotes from numpy documentation. Numpy is a collaborative contribution and from some internet stats 95% is C. I am not sure how every piece of their codes get connected to Python, but I think we have covered some general idea and not to forget this bottom line is: now we know how to build our own C module and call it from Python.