Problem understanding: Manipulating word embeddings code

Can someone explain me how this code is working? I’m having trouble understand how this output is generated with this code. Thanks.

You’re plotting the vector representation of the words in the words list.
Before there is a word_embeddings dictionary that has the vector presentation of each word and vec function that returns that with given a word as a parameter

So you have a bag2d list that has the list of vector representations for each word in the words list.

The first for loop plots an arrow for each of the word representations.
For example, for the word “joyful”, the 2nd value in the list (word[col2]) is 0.12988281 and the 3rd value (word[col3]) is 0.083984375, so an arrow is plotted from (0, 0), to (0.083984375, 0.12988281), and so on for the other values.

The second for loop plots a label and a red dot for the word representation. So for “joyful” a red dot and the word “joyful” is plotted on (0.083984375, 0.12988281).

thanks for the reply. can you please explain how each word is represented as a mulidimensional vector with around 300 values? and how can just the 3rd and 2nd column values be sufficient to represent the word when it’s said to have 100s of dimensions.

Each word is learned from a neural network model (like word2vec). They’re represented as a multidimensional vector in order to identify relationships between different words. You’ll learn about this in a later class.

I don’t think the 3rd and 2nd columns are sufficient, it’s just used for plotting in this example. I think PCA should be used if the embeddings were to be represented in 2 dimensions.

1 Like

Exactly. The question is how you would graph 100 dimensional vectors in 2 or 3 dimensional space, which is (unfortunately) all we have available for drawing graphs. So the choices are to just pick 2 or 3 dimensions or do PCA as David suggests.