Hi everyone,
In the video, Learning Word Embeddings, at 9:07, Andrew mentions the following:
What researchers found was that if you really want to build a language model, it’s natural to use the last few words as context. But if your main goal is really to learn a word embedding, you can use all of these other contexts, and they will result in very meaningful word embeddings as well.
I’m a bit confused about the distinction here. Could someone clarify the difference between a language model and a word embedding model? I thought word embeddings were a part of language models—am I misunderstanding something?
Thank you in advance for your help!
Hi there,
It’s saying that you can learn a lot about how a word is being used in a single sentence, and that will indeed capture a fair portion of the essence of that word, albeit in combination with the overall message of the sentence. But by taking the word from multiple contexts, you’re able to extract a purer signal for what the word can truly represent, with circumstantial factors canceling. It’s kind of like how taking the average of more and more observations leads to a truer measurement as the sample size grows, since it allows for random fluctuations to be taken into account.
I haven’t seen the video, but this idea is something pervasive in statistical mechanics. Tbh I don’t know much about that either.
Cheers
Hi @Zijun_Liu, @Morrowindchamp
When Andrew says, “If you really want to build a language model, it’s natural to use the last few words as context,” he means that the focus of a language model is on predicting the next word, which naturally limits it to using the immediately preceding words. However, when he says, “If your main goal is really to learn a word embedding, then you can use all these other contexts,” he means that by focusing on learning word embeddings you don’t have to limit yourself to just the last few words. Instead, you can use other forms of context (such as nearby words from different positions) to train the embeddings. The result is a more flexible and comprehensive word representation (embedding) that may not be as constrained as the context used in a language model.
So while word embeddings are indeed used in language models when your goal is explicitly to learn the embeddings (representations of each word in a way that helps the model accurately predict the next word, given some context), you can optimize for this task by considering more flexible context choices rather than strictly using the last few words.
Hope this helps!
2 Likes
Thank you for the detailed answer. It’s super helpful!
1 Like