C2W4 parse_data_from_input slow

DrNuke · April 19, 2024, 4:03pm

I noticed, using

labels = np.append(labels, line[0])

take extremely long when used inside the loop over the lines from the csv-reader.

Instead

labels.append(line[0])

is much faster.

Why is that?

paulinpaloalto · April 19, 2024, 4:31pm

The behavior of those two operations is different, both in terms of memory management and in terms of the output type.

The first one generates a complete new copy of the input list. Then the old one gets garbage collected. And the output type is a numpy array, instead of a list.

The second one is an “in-place” operation. The previous data is not copied. If the list gets long, this will be a lot faster.

You can verify this by using the “id()” function in python to see whether the memory changes or not. Here’s a little code block to show what I mean:

np.random.seed(42)
A = list(np.random.randint(0, 10, (8,)))
print(f"type(A) = {type(A)}")
print(f"A = {A}")
print(f"id(A) = {id(A)}")
B = np.append(A, 42)
print(f"type(B) = {type(B)}")
print(f"B = {B}")
print(f"id(B) = {id(B)}")
A.append(42)
print(f"type(A) = {type(A)}")
print(f"A = {A}")
print(f"id(A) = {id(A)}")

Running that gives this output:

type(A) = <class 'list'>
A = [6, 3, 7, 4, 6, 9, 2, 6]
id(A) = 127705198963040
type(B) = <class 'numpy.ndarray'>
B = [ 6  3  7  4  6  9  2  6 42]
id(B) = 127705198964128
type(A) = <class 'list'>
A = [6, 3, 7, 4, 6, 9, 2, 6, 42]
id(A) = 127705198963040

paulinpaloalto · April 19, 2024, 4:37pm

For other examples of how “in-place” operations have different behavior, even if the contents of the data are the same, here’s another thread worth a look. Please read the whole thread, not just the linked post, so see some other exciting results from object and procedure call semantics in python.

DrNuke · April 20, 2024, 11:17am

Great, thanks for the explanation.

Topic		Replies	Views
Tokenize_labels() np.array-1 issue Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	2	504	July 5, 2023
Course 1 Week 4 - Exersice 5 L Model Forward append issue Neural Networks and Deep Learning coursera-platform	5	706	March 15, 2022
Error : 'list' object has no attribute 'shape' Convolutional Neural Networks in TensorFlow week-module-4	1	596	August 22, 2022
C4W2 Ex1 Residual_Networks quick question copy vs. = Convolutional Neural Networks coursera-platform	3	727	February 18, 2022
C2W1 progrmming Assignment Convolutional Neural Networks in TensorFlow week-module-1	1	532	March 13, 2023

C2W4 parse_data_from_input slow

Related topics