Reason for numpy efficiency mentioned in the lecture

vangap · July 7, 2022, 3:56am

Hi,

In the Week 2, Vectorization part 1, it is mentioned that numpy is faster because it utilizes parallel hardware and it seems to suggest that numpy can leverage GPU.

However, I understand from multiple sources(1) that numpy doesn’t do any parallel processing and cannot leverate GPU.

The reason for numpy being fast is because of python being slow to run for loops and that numpy does these operations using compile C code.
https://numpy.org/doc/stable/user/whatisnumpy.html

Am I misinterpreting something from the lecture?

Thanks.

rmwkwok · July 7, 2022, 4:50am

Hello @vangap,

C and Python manage memory differently. C allocates continuous memory locations for an array of data of one and only one datatype, but Python doesn’t, so we can expect more overhead in Python codes. We can see that in numpy you cannot have an array of both numbers and text.

Here is a numpy doc for how it utiltize CPU/SIMD for optimization.

If you are interested, here is one other package that leverages GPU and is used with numpy.

Cheers!
Raymond

vangap · July 7, 2022, 6:22am

Thanks for the response @rmwkwok

Apologies if my original post isn’t clear, what I am trying to understand is why the reason for numpy being fast is mentioned as “numpy does parallel processing and that it can leverage GPU/parallel hardware” while it’s not the case based on the linked articles in my post.

rmwkwok · July 7, 2022, 6:38am

I have read the section of " Why is NumPy Fast?" in your link, it’s describing vectorization and broadcasting in high-level description, not exactly what makes vectorization fast in numpy.

In my 2nd link for CPU/SIMD which is used by numpy, and SIMD is hardware-dependent parallel processing technology.

vangap · July 7, 2022, 6:49am

Ok, I have read more about the SIMD (Single Instruction Multiple Data) which allows running instructions parallelly on recent hardware. Things are more clear now. (This is mentioned in the optional lab notebook whic h I just finished )

Unfortunately, numpy documentation seemed cryptic. Four points are mentioned in the section “Why is numpy fast”, all of which are related to code quality/readability but not performance. There was no mention of SIMD… Vectorization is mentioned in that section, but I suppose SIMD is the crucial thing here that gives big performance gains to numpy.

What about the reference to the GPU in the lecture? Numpy itself can’t leverage it. is it referenced in the context of something like Numba that you have referenced above?

Thanks.

rmwkwok · July 7, 2022, 8:26am

Hello @vangap, it is also my understanding that numpy alone doesn’t use GPU. I am not sure why the lecture mentioned about GPU that way, but Numba is a way out for numpy code to leverage GPU.

Topic		Replies	Views
Week 2 Vectorization and Why Num.py is fast Neural Networks and Deep Learning	3	806	May 4, 2022
Vectorization & parallel processing, where to learn more? Supervised ML: Regression and Classification week-2	4	548	June 24, 2022
NumPy and vectorization: an even faster way to calculate the dot product Neural Networks and Deep Learning	1	904	August 24, 2022
Vectorization VS Explicit For Loop Neural Networks and Deep Learning	5	565	June 27, 2022
Illustrate importance of vectorization Neural Networks and Deep Learning	7	688	April 17, 2021

Reason for numpy efficiency mentioned in the lecture

Related topics