How to understand this describe about word embeddings?

zhuyuanxuan · February 19, 2024, 10:54am

When learning word embeddings, we create an artificial task of estimating P(target∣context). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings.

gent.spah · February 19, 2024, 1:07pm

When building a natural language model its important to have good embeddings, otherwise if its just for the sake of learning the creation of embeddings process it doesnt matter much I guess.

paulinpaloalto · February 19, 2024, 11:07pm

Can you give us a reference to where Prof Ng makes that statement (i.e. which lecture and the time offset would also be helpful)? I tried searching the transcripts of several of the lectures in C5 W2 and couldn’t find that statement, although it does sound familiar. I’d like to listen to all that he says about it and hope to be able to offer some interpretation.

Without hearing the lecture again, my interpretation would be that we need a metric or cost function to train a model, so for training a word embedding the conditional probability that he shows there is a common choice. But there will undoubtedly be cases in which there could be a lot of words that would make sense or could occur in a particular position in a given sentence. Or to put it in the same terms of the probability expression: with some contexts, there could be many possible “correct” answers. So in other words, the probability of correct prediction in a case like that is not too high. But by trying to maximize it, we get useful training even if the maximum value we can achieve is not very high. Of course then the next question is how you can quantify whether the word embeddings you learn by that training process actually are useful. I’m sure Prof Ng also addresses that point and am hoping it will be clarified by listening to the relevant lecture again.

zhuyuanxuan · February 20, 2024, 3:49am

IDK whether can I show the reference of this statement…I am not sure, but there is no other backgroud or context and only this sentence.

But by trying to maximize it, we get useful training even if the maximum value we can achieve is not very high.

Anyway, your explaination makes sense and thank you very much.

Topic		Replies	Views
Week 2 quiz - word embeddings Sequence Models coursera-platform	2	578	March 12, 2022
Context window for learning word embeddings Sequence Models week-2 , coursera-platform	3	41	September 26, 2024
Week2 >> Word imbedding training Sequence Models coursera-platform	2	556	April 28, 2021
[Week 2] - Embedding and Transfer Learning Sequence Models coursera-platform	6	613	May 24, 2021
Week2 - Learning Word Embeddings Sequence Models coursera-platform	2	538	August 7, 2022

How to understand this describe about word embeddings?

Related topics