Question on Sentiment Classification Lecture

Max.Power · January 19, 2024, 11:58am

Hi all,

I have a question on the lecture on sentiment classification in week 2 of the sequence models course. In particular on slide 2:

In the beginning Andrew says for the sentence “The dessert is excellent” we assume a vocabulary of 10000 (as in all the examples). So a one-hot vector representation of one word would be of dimenstion (10000, 1).

Then when introducing the embedding matrix E, with a dimension of 300 embedding features, he mentions that it could be trained on a much larger dataset i.e. made up of 1 billion words.

From what I understand that E then would be a matrix of dimension (300, 1000000000), i.e. one embedding for every word in the dataset.

So the matrix multiplication does not work or in other words, how to I find one particular word of a smaller vocabulary in the embedding matrix of a much larger corpus?

I am sure I am getting something wrong. Would be happy if someone could go a bit more into detail about that.

Thank you so much for your help!

cheers,

Max

balaji.ambresh · January 19, 2024, 12:25pm

Please provide link to lecture and the timestamp.

Max.Power · January 19, 2024, 12:30pm

~ 2:05 Minutes

balaji.ambresh · January 19, 2024, 1:40pm

There are 2 techniques people use when dealing with pretrained word embeddings:

Initialize the embedding layer with the pretrained embedding weights and map training / test data words to match the word indices in the pretrained embedding for the lookup to work properly.
Create a new embedding matrix and initialize the weights with pretrained weights only for the words in your training corpus. Advantage of this approach over the previous one is that you can potentially have a smaller embedding matrix if space is of concern.

For words in your corpus that are outside the pretrained embedding vocabulary, evaluate the following approaches:

Map these words to OOV (out of vocabulary) token.
Initialize to random weights.
Initialize to zeros.

paulinpaloalto · January 19, 2024, 3:53pm

When he says that the embedding model was trained on a billion words, that probably refers to the size of the training corpus, not the size of the vocabulary. Are there a billion unique words in English. I doubt it. But even if the vocabulary is what he’s talking about, you’ll have to subset the embedding to only include the words in your vocabulary as Balaji describes.

Max.Power · January 19, 2024, 4:05pm

Hi Balaji, Hi Paul,

thank you for your help! I think I understand it now.

Perhaps you could think about adding it as a note to the lecture as from the slides itself it does not become clear that there is an intermediate step required to make the pretrained embeddings compatible with an individual problem.

Thank you again for your efforts!

Cheers,

Max

balaji.ambresh · January 19, 2024, 5:12pm

You’re welcome, Max. The staff have been informed about your suggestion.

Topic		Replies	Views
Understanding Word Embeddings Sequence Models week-2	2	283	February 14, 2024
[Week 2] - Embedding and Transfer Learning Sequence Models	6	613	May 24, 2021
Embedding matrix, connecting method in Deep Learning Specialization course 5 and short course "Understanding and applying text embeddings" AI Discussions ai-discussions	4	283	April 10, 2024
Course5week2, Question about Quiz of Natural Language Processing & Word Embeddings Sequence Models	4	507	June 4, 2023
English word embeddings NLP with Classification and Vector Spaces week-4	2	342	October 18, 2023

Question on Sentiment Classification Lecture

Related topics