Context window for learning word embeddings

Zijun_Liu · September 25, 2024, 11:57pm

Hi everyone,

In the video, Learning Word Embeddings, at 9:07, Andrew mentions the following:

What researchers found was that if you really want to build a language model, it’s natural to use the last few words as context. But if your main goal is really to learn a word embedding, you can use all of these other contexts, and they will result in very meaningful word embeddings as well.

I’m a bit confused about the distinction here. Could someone clarify the difference between a language model and a word embedding model? I thought word embeddings were a part of language models—am I misunderstanding something?

Thank you in advance for your help!

Morrowindchamp · September 26, 2024, 4:08am

Hi there,

It’s saying that you can learn a lot about how a word is being used in a single sentence, and that will indeed capture a fair portion of the essence of that word, albeit in combination with the overall message of the sentence. But by taking the word from multiple contexts, you’re able to extract a purer signal for what the word can truly represent, with circumstantial factors canceling. It’s kind of like how taking the average of more and more observations leads to a truer measurement as the sample size grows, since it allows for random fluctuations to be taken into account.

I haven’t seen the video, but this idea is something pervasive in statistical mechanics. Tbh I don’t know much about that either.

Cheers

nadtriana · September 26, 2024, 12:10pm

Hi @Zijun_Liu, @Morrowindchamp

When Andrew says, “If you really want to build a language model, it’s natural to use the last few words as context,” he means that the focus of a language model is on predicting the next word, which naturally limits it to using the immediately preceding words. However, when he says, “If your main goal is really to learn a word embedding, then you can use all these other contexts,” he means that by focusing on learning word embeddings you don’t have to limit yourself to just the last few words. Instead, you can use other forms of context (such as nearby words from different positions) to train the embeddings. The result is a more flexible and comprehensive word representation (embedding) that may not be as constrained as the context used in a language model.

So while word embeddings are indeed used in language models when your goal is explicitly to learn the embeddings (representations of each word in a way that helps the model accurately predict the next word, given some context), you can optimize for this task by considering more flexible context choices rather than strictly using the last few words.

Hope this helps!

Zijun_Liu · September 26, 2024, 5:36pm

Thank you for the detailed answer. It’s super helpful!

Topic		Replies	Views
Week 2 quiz - word embeddings Sequence Models	2	577	March 12, 2022
Week2 - Learning Word Embeddings Sequence Models	2	537	August 7, 2022
How to understand this describe about word embeddings? Sequence Models week-2	3	264	February 20, 2024
Skip gram model clarification Sequence Models	1	520	March 13, 2022
LLM Paper - Knowledge AI Discussions ai-discussions , large-language-model	6	249	February 18, 2024

Context window for learning word embeddings

Related topics