Natural Language Processing & Word Embeddings

When we are learning word embedding models like word2vec, glove our primary goal is to find the word embeddings such that two similar words can have similar embeddings. But in a skip-gram model (X → Y) we are using the Embedding vector (e) as the input which eventually needs to be derived. Could someone please explain how we are using this E to derive the embedding vector.

Let’s take a look at skip-gram with a simple network below.

vocab_size = 3000
embeding_size = 100
model = tf.keras.Sequential([
    tf.keras.Input(shape =(vocab_size,), name='context_word'),
    tf.keras.layers.Dense(embeding_size, use_bias=False, name='embedding'),
    tf.keras.layers.Dense(vocab_size, activation='softmax', name='target_word')
] name='skip-gram')


This is a simple version skip-gram model. Just like Andrew said in lecture, the input is a one-hot vector with vocabulary size (3000 in the case), the output of hidden layer (embedding layer) is an embedding vector with embedding size (100 in the case), the Parameter # of embedding layer is the size of embedding matrix E (3000 x 100 in the case), and the output of output layer is a target word (Parameter is theta.)
Hopefully, it’s helpful.

1 Like

Shouldn’t size of embedding matrix be 100 X 3000 ?