Why do we use dot product as the method to find the relevance between word embeddings?

Christian_Simonis · October 27, 2023, 6:24pm

thanks for your question.

The dot product helps to figure out what is most relevant. In your example the model might focus on „don‘t“ and „like“ because of the given context and not because these word are semantically similar.

Here you can find a nice explanation of multi head attention, touching upon different focal points of the heads: Multi-headed Attention the mathematical meaning - #2 by arvyzukai

Word2Vec also relies on learned patterns of data, however this is rather an embedding space where similar words have similar embedding vectors (like in one specific head, explained in the forum link above), where similar embedding vectors are close to each other or put differently: synonymous or semantically similar words can be modelled effectively.

So, fundamentally multi head attention is much more geared towards how human beings understand and process words with context to draw conclusions.

Hope that helps!

Best regards
Christian

Topic		Replies	Views
Scaled dot product attention implicit assumptions NLP with Attention Models week-module-1	3	396	August 17, 2023
Understanding of Scaled Dot-Product Attention with math NLP with Attention Models week-module-2	3	441	July 29, 2023
Intuition reagarding why output of "scaled-dot product" attention represents similarity between tokens NLP with Attention Models course-related , week-module-2 , conceptual-question	1	227	May 1, 2024
Week4, assignment, scaled_dot_product_attention() Sequence Models coursera-platform	2	416	September 25, 2023
Week2 Practice Quiz question 4 NLP with Attention Models week-module-2	1	326	October 30, 2023

Why do we use dot product as the method to find the relevance between word embeddings?

Related topics