Problem understanding: Manipulating word embeddings code

Naveen_Malla · November 10, 2022, 12:08am

Can someone explain me how this code is working? I’m having trouble understand how this output is generated with this code. Thanks.

davidguo94 · November 10, 2022, 1:35am

You’re plotting the vector representation of the words in the words list.
Before there is a word_embeddings dictionary that has the vector presentation of each word and vec function that returns that with given a word as a parameter

So you have a bag2d list that has the list of vector representations for each word in the words list.

The first for loop plots an arrow for each of the word representations.
For example, for the word “joyful”, the 2nd value in the list (word[col2]) is 0.12988281 and the 3rd value (word[col3]) is 0.083984375, so an arrow is plotted from (0, 0), to (0.083984375, 0.12988281), and so on for the other values.

The second for loop plots a label and a red dot for the word representation. So for “joyful” a red dot and the word “joyful” is plotted on (0.083984375, 0.12988281).

Naveen_Malla · November 10, 2022, 10:18am

thanks for the reply. can you please explain how each word is represented as a mulidimensional vector with around 300 values? and how can just the 3rd and 2nd column values be sufficient to represent the word when it’s said to have 100s of dimensions.

davidguo94 · November 11, 2022, 12:48am

Each word is learned from a neural network model (like word2vec). They’re represented as a multidimensional vector in order to identify relationships between different words. You’ll learn about this in a later class.

I don’t think the 3rd and 2nd columns are sufficient, it’s just used for plotting in this example. I think PCA should be used if the embeddings were to be represented in 2 dimensions.

paulinpaloalto · July 30, 2024, 7:05pm

Exactly. The question is how you would graph 100 dimensional vectors in 2 or 3 dimensional space, which is (unfortunately) all we have available for drawing graphs. So the choices are to just pick 2 or 3 dimensions or do PCA as David suggests.

Topic		Replies	Views
Understanding Word Embeddings Sequence Models week-module-2 , coursera-platform	2	288	February 14, 2024
Creating embeddings of entire tweets NLP with Sequence Models week-module-1	5	502	February 15, 2023
C5 Week 2 dimensions of embedding matrix Sequence Models coursera-platform	4	635	May 24, 2021
Search Document Video NLP with Classification and Vector Spaces week-module-4	3	491	August 29, 2022
A note on the featurization view of word embeddings Sequence Models coursera-platform	4	406	July 31, 2023

Problem understanding: Manipulating word embeddings code

Related topics