C2W4 parse_data_from_input slow

I noticed, using

labels = np.append(labels, line[0])

take extremely long when used inside the loop over the lines from the csv-reader.

Instead

labels.append(line[0])

is much faster.

Why is that?

The behavior of those two operations is different, both in terms of memory management and in terms of the output type.

The first one generates a complete new copy of the input list. Then the old one gets garbage collected. And the output type is a numpy array, instead of a list.

The second one is an “in-place” operation. The previous data is not copied. If the list gets long, this will be a lot faster.

You can verify this by using the “id()” function in python to see whether the memory changes or not. Here’s a little code block to show what I mean:

np.random.seed(42)
A = list(np.random.randint(0, 10, (8,)))
print(f"type(A) = {type(A)}")
print(f"A = {A}")
print(f"id(A) = {id(A)}")
B = np.append(A, 42)
print(f"type(B) = {type(B)}")
print(f"B = {B}")
print(f"id(B) = {id(B)}")
A.append(42)
print(f"type(A) = {type(A)}")
print(f"A = {A}")
print(f"id(A) = {id(A)}")

Running that gives this output:

type(A) = <class 'list'>
A = [6, 3, 7, 4, 6, 9, 2, 6]
id(A) = 127705198963040
type(B) = <class 'numpy.ndarray'>
B = [ 6  3  7  4  6  9  2  6 42]
id(B) = 127705198964128
type(A) = <class 'list'>
A = [6, 3, 7, 4, 6, 9, 2, 6, 42]
id(A) = 127705198963040

For other examples of how “in-place” operations have different behavior, even if the contents of the data are the same, here’s another thread worth a look. Please read the whole thread, not just the linked post, so see some other exciting results from object and procedure call semantics in python. :nerd_face:

Great, thanks for the explanation.