Word embedding parameters + transfer learning

I don’t really understand what parameters are being learned in word embedding models like word2vec and how transfer learning works for these models. The goal is to learn parameters for word embedding matrix E, but we also have the hidden layer and softmax layer parameters that are being learned to predict the target word or context right? So when we use transfer learning are we just taking the word embedding matrix E and using it for a different application or are we taking the whole word2vec model to predict context or target for a different group of words?

Word embeddings are learnt from large corpora like wikipedia. word_2vec is an algorithm that learns these embeddings. Transfer learning on learnt word embeddings can be done in the following ways:

  1. Take the word embeddings and use them directly in your problem. Don’t train this embedding layer but only your problem specific layers.
  2. Fine-tune embeddings to your particular problem by initializing an embedding layer with the word embeddings and training on your dataset.
1 Like

Just to clarify, in the word2vec model, the learnt word embeddings are only the parameters of the first layer, not the entire word2vec model right?

Sorry. I don’t understand your question. How can the entire model be the first layer of itself?

When using transfer learning using word embeddings, you’ll need 2 things from the provider:

  1. Embedding matrix that represents a multidimensional vector for each word.
  2. Words list where words are provided in the right order. If embedding matrix is of dimension VOCAB x EMBEDDING_DIM, the words will have to be in the order of row indices that correspond to them.

Here are the steps for transfer learning:

  1. Include embedding matrix as the 1st layer as weights of Embedding layer. If you don’t want these embeddings to be tuned, set the trainable parameter to False for this layer.
  2. For every input sentence, map each word to the index in the words. This will help get embeddings that correspond to that word from the embedding matrix. The work of getting the right embedding is done by the Embedding layer. You just have to provide to the right index.
  3. Add additional layers to your model AFTER the embedding layer.
  4. Compile & train.