C5 W2 A2 GloVe

francktchafa · February 8, 2024, 12:49am

I looked into the codes provided for the assignment. I understand every portion of the code below except this statement: word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)

Can someone please explain what is going on here, please? THat will help tremendously. Thanks!

CODE:

def read_glove_vecs(glove_file):
    with open(glove_file, 'r') as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

paulinpaloalto · February 8, 2024, 2:05am

For anyone else who sees this, it is one of the utility functions that is provided for you with the assignment. The purpose is to read in the data for a pre-trained GloVE embedding map. It is not solution code, so it’s ok to show the source and discuss how it works.

You can print out some of the data as it’s being processed to see what is going on. The input file is a text file with 400000 lines. Each line has 51 tokens: the first is a word, the next 50 are ASCII encoded floating point numbers that form the 50 element GloVe embedding vector for that word. So what that line does is index off the last 50 elements and convert them to a numpy array with 50 elements of type np.float64. It then assigns that to the dictionary entry that maps the word in string form to the embedding.

Then it returns 3 python dictionaries that give you complete access to the mappings between words, index values and embedding vectors.

Here’s my instrumented version of the code that I used to figure that out:

def my_read_glove_vecs(glove_file):
    with open(glove_file, 'r') as f:
        words = set()
        word_to_vec_map = {}
        lines = 0
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
            my_vec = word_to_vec_map[curr_word]
            lines += 1
            if lines < 110 and lines > 100:
                print(f"line {lines} word {curr_word} vec {my_vec.shape}")
        
        print(f"total words {lines}")
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

When I run that in the notebook, here’s what I get:

line 101 word so vec (50,)
line 102 word them vec (50,)
line 103 word what vec (50,)
line 104 word him vec (50,)
line 105 word united vec (50,)
line 106 word during vec (50,)
line 107 word before vec (50,)
line 108 word may vec (50,)
line 109 word since vec (50,)
total words 400000

TMosh · February 8, 2024, 3:09am

If you’re going to post code on the forum, please use the “preformatted text” tool, so that the indentation is correct and your code doesn’t look like Markdown.

paulinpaloalto · February 8, 2024, 5:04am

Good point. I edited the OP with the code formatting tool.

francktchafa · February 9, 2024, 4:58pm

Thanks for the prompt reply

Topic		Replies	Views
W2 - Assignment 2: In second part, why do we get the index of the words instead of accessing the GloVe vectors directly from word_to_vec_map? Sequence Models coursera-platform	5	531	November 29, 2022
Course 5 Week 2 Assignment 1 Sequence Models coursera-platform	5	604	July 20, 2022
GloVe Neural Network Architecture Sequence Models coursera-platform	12	620	March 22, 2023
Word vector does not load Sequence Models coursera-platform	4	559	May 29, 2023
Error in the emojify assignment on my pc Sequence Models coursera-platform	4	562	September 14, 2021

C5 W2 A2 GloVe

Related topics