When to use which embedding

Atharv_Gulati · June 2, 2024, 7:16am

Hi Everyone,
Supposedly while using CBoW, while saying happy and sad if both of them have same context words, then according to Word2Vec they will be nearer in vector space whereas they differ in meaning

Please help me out with this confusion.

Thanks & Regards

Natural Language Processing week-2

Anna_Kay · June 2, 2024, 8:34am

Hi @Atharv_Gulati,

I think this post & anwser may make it clearer for you:

but do not hesitate to ask any further questions

Best

arvyzukai · June 3, 2024, 7:36am

Hi @Atharv_Gulati

That is a good question. I think your confusion lies in synonymy vs. similarity concepts.

You’re correct that Word2Vec is based on distributional hypothesis - the link between similarity is in how words are distributed (or in simple words, words that occur in similar contexts tend to have similar meanings). But note that similarity is not the same as synonymy.

Two words are synonymous if they can be substituted for one another in any sentence without changing the “truth” of the sentence. For example, “car” and “automobile” or “water” and “H2O” have the same propositional meaning - you can substitute these two words in most sentences without altering the “truth”.

Most words don’t have many synonyms, but words have many other types of similarities with other words. For example, “H2O” and “automobile” might be more common in scientific documents hence more “similar” in that regard. Or, “Cat” is not a synonym of “Dog”, but is more “similar” to “Dog” than to “Airplane”.

Loosely speaking, that is why we let the embeddings have as many dimensions as we think is necessary. Every word can be represented in this multi dimensional space so that the model could choose one of the dimensions to represent “happiness”, another to represent “adjectiveness” etc.

So the words “happy” and “sad” would be near in vector space on “adjectiveness” and probably other dimensions but would have opposite values on “happiness”. But overall - they would be pretty similar.

For example:

I feel so ______. I’m crying.

You could agree that word “happy” is more probable here than many other English words like “nutrition”, “PC” and others.

Cheers

Topic		Replies	Views
Course 3 Embedding Layer vs Course 2's Extracting Word Embeddings NLP with Sequence Models week-1	1	186	May 16, 2024
About word embeddings in the CBOW model NLP with Probabilistic Models week-4	1	519	December 1, 2022
W2 - quiz - similarity of theta and word embedding vector Sequence Models coursera-platform	2	494	July 5, 2024
How are word embedding calculated end to end NLP with Sequence Models week-1	6	599	January 10, 2023
Intuition behind using the weights of a CBOW model as word embeddings NLP with Probabilistic Models week-4	2	573	September 5, 2023

When to use which embedding

Related topics