Question about Embeddings and Cosine Similarity

Alex_Gris · February 5, 2023, 11:06am

Hello,

When using the cosine similarity the vectors are normalized, cos(a, b) = dot(a, b) / (norm(a) * norm(b))

1/ It means that basically all embeddings are on an arc with radius 1, which means that the only thing that matters is the proximity on the circle arc. So this kind of proximity, unless materialized through a projection on the arc, is not captured:

2/ reg. separation planes, from what I see in the course all planes pass through (0,0), which means that you can’t have a “triangle” region as highlighted in the course.

I am aware that I am probably making some confusion, but it is not clear at all for me where.

3/ If cosine similarity is used for measuring embeddings, then why the embeddings are not normalized from the get-go? How / when to use the fact that embeddings are not normalized?

Thank you for clarifying!

arvyzukai · February 7, 2023, 10:53am

Hi @Alex_Gris

Yes, this kind of distribution (like in the picture) of datapoints usually is not normal. It is common in “plain old” linear regression but not in large language models. Especially in high dimensions all features are usually normalized. Some more interesting content on the matter.

Yes, usually this is not the way “seperation” happens. It is usually with rotation matrices (from the Reformer paper):

They are normalized from the get-go (the weights are initialized with normal distribution with mean 0 and variance 1).

Cheers

Topic		Replies	Views
Dot product vs. cosine distance between embedding vectors LangChain: Chat with Your Data	2	333	July 13, 2023
Why not use coef of correlation instead of cosine similarity between embedding vectors AI Discussions ai-discussions , llm	1	109	May 9, 2024
Why is simple matmul of embedding vectors describes theirs similarity? Embedding Models: From Architecture to Implementat	36	482	August 13, 2024
C3_W2 content-based filtering, dot product comment on slides Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	253	March 1, 2024
Range of cosine similarity: between 0 and 1 NLP with Classification and Vector Spaces week-module-3	14	353	July 20, 2023

Question about Embeddings and Cosine Similarity

Related topics