Week 1 Exercise 9 Issue with using list vs set

from hint it’s clear that set.update works and it did work. Although I was curious to try lists so for every word we modify from the edit_one_set we send it to the function again for modification and at this point I was adding the returned set to a list by converting and appending. so in the end theoretically I should end up will a list of all possible values of 2 edits of the word. then I apply set to the list to get the unique values. although it’s just appending the list, this method make the kernel die every time. Please help me understand how this particular list operation became too expensive.

Hey @Naveen_Malla,
Please help me in understanding your query a little clear. We know that edit_one_letter returns a set of all the “1-character-edited words”. Now, in edit_two_letters, you first used edit_one_letter to get all the words, which you have to iterate over.

Then in the loop, you get a set of words for each of the words that you got before, and here, you are trying to change something, i.e., instead of directly appending the sets to the global set, you are storing the sets in a global list, and post-loop completion, you have a list of sets, which you are trying to convert into a set.

Am I missing anything or is something out-of-the-order, in my explanation? Please let me know.

Cheers,
Elemento

1 Like

Hey yes your explanation is precise. This method makes the kernel dead. I want to understand why.

Hey @Naveen_Malla,
I guess this is more of a Pythonic question, rather than an NLP-based question, and the answer simply lies in how Python implements the different data structures and their associated methods under the hood. Refer to the below code:

# importing the modules
import numpy as np
import tracemalloc

np.random.seed(0)

### Here `local_set` is a set. It simulates the output of the `edit_one_letter` function

# Simulation Parameters
iters = None
size_local_set = None
 
# Appending the local set to the global set in the `for` loop only
def approach_1():
    global_set = set()
    for i in range(0, iters):
        nums = np.random.randn(size_local_set,)
        local_set = set([int(i * iters * size_local_set) for i in nums])
        global_set.update(local_set)
        
# Storing the sets in a global list by converting each of them in a list
def approach_2():
    global_list = []
    for i in range(0, iters):
        nums = np.random.randn(size_local_set,)
        local_set = set([int(i * iters * size_local_set) for i in nums])
        global_list.extend(list(local_set))
 
    global_set = set(global_list)
    del global_list

# Starting the monitoring
tracemalloc.start()
 
# function call
approach_1() 
 
# displaying the memory
current, peak = tracemalloc.get_traced_memory()
print(f"Peak Memory Consumption of Approach 1: {peak / 10**3} KB")

# Starting the monitoring
tracemalloc.start()
 
# function call
approach_2() 
 
# displaying the memory
current, peak = tracemalloc.get_traced_memory()
print(f"Peak Memory Consumption of Approach 2: {peak / 10**3} KB")

Now, let’s see the difference between the storage consumption of the 2 approaches, as the parameters, iters and size_local_set get larger and larger.

iters = 5; size_local_set = 5
Peak Memory Consumption of Approach 1: 3.5 KB
Peak Memory Consumption of Approach 2: 5.076 KB

iters = 10; size_local_set = 10
Peak Memory Consumption of Approach 1: 9.756 KB
Peak Memory Consumption of Approach 2: 16.0 KB

iters = 100; size_local_set = 100
Peak Memory Consumption of Approach 1: 542.056 KB
Peak Memory Consumption of Approach 2: 1032.968 KB

I suppose the above results resolve your query. Now, for knowing why this is so, you can read about how Python implements the different data structures and their methods, like their time and memory complexities, which you can easily find with a single Google search. I hope this helps.

Cheers,
Elemento

3 Likes

Wow. I love the explanation. Thank you so much. Hope Deeplearning pays you for this.

Hey @Naveen_Malla,
I am glad I could help, and yes DeepLearning.AI pays us for this, in terms of the opportunities to help our fellow learners and to give back to the community :nerd_face:

Cheers,
Elemento