Why do we need the softmax parameters in word2vec?

kdj · October 10, 2023, 11:43pm

If you dotted the context word with itself the dot product would be large, which I guess is not the desired result. That’s a good point!

Just because one word is the most similar to another word does not mean that we should predict that as the context from the target.

An example I can think of is that different parts of speech of the same root word are not good predictions. If the target word is “strong”, then the context word is probably not “strongly”, which is one of the closest words when it comes to cosine similarity.

We need the theta parameters to account for the extra complexity that is involved with prediction (just similarity does not result in good prediction).

I understand now, thank you so much!

Topic		Replies	Views
C5W2 Word2Vec video - theta Sequence Models	2	560	January 16, 2023
Word2Vec theta matrice Sequence Models week-2	6	262	August 9, 2024
Some confusion on Word2Vec model NLP with Sequence Models week-2	1	483	July 5, 2023
Theta parameter introduced In Class 5, week 2 Sequence Models	5	545	August 8, 2024
Why Theta is transposed in Word2Vec Model Sequence Models	1	494	May 25, 2023

Why do we need the softmax parameters in word2vec?

Related topics