About the quiz question in lecture "Manipulating Words in Vector Spaces"

In this lecture, there is a quiz:
image
If I follow the method in the lecture, the corresponding vector I should find is
USA(5,6) - Washington(10,5) + Ankara (9,1) = (4, 2).
The Euclidian distance to Japan (4,3) is 1, to Turkey(3,1) is 1.41.
Even if we take cosine similarity of (Japan - Ankara)= (-5, 2) and (USA - Washington) =(-5,1) , it is a bit larger than (Turkey - Ankara) = (-6,0) and (USA - Washington).
Why is the answer Turkey with d=1.41 rather than Tokyo with d=1? What did I misunderstand?

Thanks.

Hi, @Jack_Changfan.

The question is made of two parts:

What country? - (for that use cosine similarity: Turkey 0.9899 vs Japan 0.9839)
What distance? - (Euclidean distance: Turkey 1.41 vs Japan 1)

  # country = Ankara(9,1) - Washington(10,5) + USA(5,6) = (4, 2)

  # Cosine similarity: country (4, 2); Turkey(3, 1) = 0.9899
  (4*3 + 2*1) / ((4**2 + 2**2)**0.5 * (3**2 + 1**2)**0.5)

  # Cosine similarity: country (4, 2); Japan(4, 3) = 0.9839
  (4*4 + 2*3) / ((4**2 + 2**2)**0.5 * (4**2 + 3**2)**0.5)

So the correct answer is Turkey (first part) with d=1.41 (second part).

Thanks for the clarification.
One more question about the similarity we use here.
Since we are answering the question USA: DC = X: Ankara, I was taking USA-DC as the vector X-Ankara should be similar. So, I use
</>
USA_DC = USA-DC
JPN_ANK = JPN-ANK
TUR_ANK = TUR-ANK
print(cos_sim(USA_DC, JPN_ANK))
print(cos_sim(USA_DC, TUR_ANK))
=>
0.9832820049844603
0.9805806756909202
</>
In such model JPN-Ankara has higher similarity to USA-DC.
And in your explanation, you calculate the target vector first, and compare the similarity to this target vector, and that produced a complete different result. But what’s wrong with trying to get larger cosine similarity between USA-DC and X-Ankara?

One disadvantage of using the word vector (from origin) as the vector for cosine similarity calculation is that once we move the reference point (the origin), the result would be very different. While taking USA-DC and X-Ankara, no matter where the reference point is, the result would be the same. Does what I’m thinking make sense?

Thanks.

Yes, it kind of makes sense if I understand you correctly. But the question is:

Use the method presented in the previous slide to predict which is the country whose capital is Ankara.

In the lecture video when we try to predict capital of Russia we go from country to capital - that is USA->DC and Russia->? - so we calculate the difference vector from country to capital ([5, -1] in that case, 5 to the right, and 1 down.).
When we try to find country from capital, we should try to estimate DC->USA and Ankara->?

So, we should try to find vector pointing TO a country we want to find (from the reference point of Ankara):
USA - Washington = [-5, 1] # (5 to the left, and 1 up)
Unknown_country - Ankara = [?, ?]

So we plug in DC->USA vector to find this country:
Unknown_country - Ankara = [-5, 1] # (what country is 5 to the left, and 1 up?)
which should be:
Unknown_country = Ankara - Washington + USA
Unknown_country = [4, 2] # here cosine similarity would be 1, but there is no country here.

So we find cosine similarity to Japan and Turkey (calculations above) and I think this is what the question asks.

Now, to your question:
What you are calculating is something different:
usa_wdc_vec = usa - wdc # [-5, 1]
jpn_ank_vec = jpn - ank # [-5, 2] (cos_sim = 0.9833)
tur_ank_vec = tur - ank # [-6, 0] (cos_sim = 0.9806)

You are searching which vector of differences is most similar to the usa-dc difference. So yeah, it is the case that “5 to left, and 2 up” is more cosine similar than “6 to the left and 0 up/down”. But this is not what the method in the slide presented and I’m not sure I understand why this method would be superior? Why would you say:

While taking USA-DC and X-Ankara, no matter where the reference point is, the result would be the same.

In particular, if you change any point (USA, DC, JPN, TUR, ANK) the results would be different in both methods?
If you are saying that you would transform the space (move [0, 0] or transform it any other way like skewing, rotating etc.) then my answer would be that the whole point of this space is to be exactly how it is and it makes a huge difference.

I see. Thanks for your explanation.

Mmm, another question about this.
Say, we have another word X(400, 200), which is far from the country names. With this cosine similarity, we would probably choose it over either Japan or Turkey since its cosine is 1, right? But if we use Ankara-X and Washington-USA, their similarity would be small (-0.785), which will exclude this false similarity, right?
Sorry, just trying to explore the pros and cons of different ways of modeling this.

No problem - now I understand your previous question better.

To rephrase it - my main point was that you have to calculate cosine similarity from Ankara and not from countries because the question is asking ‘which is the country whose capital is Ankara’ (using the method in previous slide) and not ‘which capital is closest from Japan or Turkey or other’ .

I see that your intuition lies with Euclidean distance and from there is the confusion. So, I would ask a counter question - what about a point Z(-500, 100)? In this case the cosine similarity ‘from country to capital’ would be 1 (and I think you would find it worse than Turkey(-6, 0))

Cosine similarity is all about the vector arrow and anything that lies on it or nearest to it (no matter how far or near in Euclidean space) as explained in the previous video about ‘Food corpus’ vs ‘Agriculture corpus’ vs ‘History corpus’.

In reality, you could combine Euclidean distance with Cosine similarity to get a hybrid score. You could also use other similarity metrics:

and many more.

Cheers!

2 Likes