[Week 2] - Embedding and Transfer Learning

AfoDubhashi · May 23, 2021, 4:51pm

Hi,

I had a few questions about word embedding and transfer learning:

1: We’ve been told that the embedded matrix is learned and that the axes are hard to decipher. So how is the word embedding vector for previously unknown words (e.g. durian) constructed? Is a human sitting and assigning values based on synonyms eg. durian embedding vector = orange embedding vector?

2: Professor mentions that a text corpus of 100B words could be used to train the embedding matrix. What is the size of the embedding matrix? (300 X Unique words in corpus). Do we take a subset of this corpus and embedding to train the RNN or do we take just the embedding and hope the words from the embedding show up in the training dataset?

Thank you.

manifest · May 24, 2021, 8:17am

Hey @AfoDubhashi, great questions!

As a side note, a word embedding is a dense representation of a word. But we also can decide to encode letters, subwords, or even sentences instead.

To create word embeddings:

We need to tokenize words of our vocabulary first. We also add a special token [OOV] (out of vocabulary) to represent unknown words that may be seen during inference.
There are many ways to create embeddings from tokens. The modern approach would be to pass the tokens through a neural network and then use weights of the inner layer of the network as a dense vector representations of the words.

Dimensionality of the embedding may be any of your choice. For large networks we usually have large embedding dimensions, because such networks are able to learn more.

manifest · May 24, 2021, 8:24am

We usually learn embeddings with minibatch stochastic methods. That means we don’t need to have entire corpus in memory, we only need to have a single minibatch.

AfoDubhashi · May 24, 2021, 1:30pm

Say we have a separate token for OOV or UNK. This can take on any shape or form in a sentence, like a joker in a pack of cards. How is it that we are able to know durian is semantically similar to orange if the former has an OOV embedding? I guess my question is how do we construct embeddings for words that do not occur in the vocabulary other than using an OOV which obviously has no meaning.

manifest · May 24, 2021, 2:20pm

With word embeddings – [OOV] is the only option. That’s one of the reasons why we tokenize subwords in practice. If you tokenize letters, you don’t have such problem at all – your vocabulary is finite.

AfoDubhashi · May 24, 2021, 2:25pm

Noted. Thank you. Is there an article/literature I should be reading to better understand this? I guess I don’t have a firm grasp on this topic

manifest · May 24, 2021, 2:30pm

In my opinion, Speech and Language Processing the best NLP book at the moment.

Topic		Replies	Views
Embedding matrix, connecting method in Deep Learning Specialization course 5 and short course "Understanding and applying text embeddings" AI Discussions ai-discussions	4	283	April 10, 2024
Question on Sentiment Classification Lecture Sequence Models week-2	6	262	January 19, 2024
Understanding Word Embeddings Sequence Models week-2	2	283	February 14, 2024
Natural Language Processing & Word Embeddings Sequence Models	3	670	May 30, 2025
Word embedding parameters + transfer learning Sequence Models	3	584	May 24, 2022

[Week 2] - Embedding and Transfer Learning

Related topics