Natural Language Processing & Word Embeddings

Let’s take a look at skip-gram with a simple network below.

vocab_size = 3000
embeding_size = 100
model = tf.keras.Sequential([
    tf.keras.Input(shape =(vocab_size,), name='context_word'),
    tf.keras.layers.Dense(embeding_size, use_bias=False, name='embedding'),
    tf.keras.layers.Dense(vocab_size, activation='softmax', name='target_word')
] name='skip-gram')
model.summary()

image

This is a simple version skip-gram model. Just like Andrew said in lecture, the input is a one-hot vector with vocabulary size (3000 in the case), the output of hidden layer (embedding layer) is an embedding vector with embedding size (100 in the case), the Parameter # of embedding layer is the size of embedding matrix E (3000 x 100 in the case), and the output of output layer is a target word (Parameter is theta.)
Hopefully, it’s helpful.