About the quiz question in lecture "Manipulating Words in Vector Spaces"

Jack_Changfan · March 16, 2022, 5:42pm

In this lecture, there is a quiz:

If I follow the method in the lecture, the corresponding vector I should find is
USA(5,6) - Washington(10,5) + Ankara (9,1) = (4, 2).
The Euclidian distance to Japan (4,3) is 1, to Turkey(3,1) is 1.41.
Even if we take cosine similarity of (Japan - Ankara)= (-5, 2) and (USA - Washington) =(-5,1) , it is a bit larger than (Turkey - Ankara) = (-6,0) and (USA - Washington).
Why is the answer Turkey with d=1.41 rather than Tokyo with d=1? What did I misunderstand?

Thanks.

arvyzukai · March 16, 2022, 6:05pm

Hi, @Jack_Changfan.

The question is made of two parts:

What country? - (for that use cosine similarity: Turkey 0.9899 vs Japan 0.9839)
What distance? - (Euclidean distance: Turkey 1.41 vs Japan 1)

  # country = Ankara(9,1) - Washington(10,5) + USA(5,6) = (4, 2)

  # Cosine similarity: country (4, 2); Turkey(3, 1) = 0.9899
  (4*3 + 2*1) / ((4**2 + 2**2)**0.5 * (3**2 + 1**2)**0.5)

  # Cosine similarity: country (4, 2); Japan(4, 3) = 0.9839
  (4*4 + 2*3) / ((4**2 + 2**2)**0.5 * (4**2 + 3**2)**0.5)

So the correct answer is Turkey (first part) with d=1.41 (second part).

Jack_Changfan · March 16, 2022, 7:33pm

Thanks for the clarification.
One more question about the similarity we use here.
Since we are answering the question USA: DC = X: Ankara, I was taking USA-DC as the vector X-Ankara should be similar. So, I use
</>
USA_DC = USA-DC
JPN_ANK = JPN-ANK
TUR_ANK = TUR-ANK
print(cos_sim(USA_DC, JPN_ANK))
print(cos_sim(USA_DC, TUR_ANK))
=>
0.9832820049844603
0.9805806756909202
</>
In such model JPN-Ankara has higher similarity to USA-DC.
And in your explanation, you calculate the target vector first, and compare the similarity to this target vector, and that produced a complete different result. But what’s wrong with trying to get larger cosine similarity between USA-DC and X-Ankara?

One disadvantage of using the word vector (from origin) as the vector for cosine similarity calculation is that once we move the reference point (the origin), the result would be very different. While taking USA-DC and X-Ankara, no matter where the reference point is, the result would be the same. Does what I’m thinking make sense?

Thanks.

arvyzukai · March 16, 2022, 10:05pm

Yes, it kind of makes sense if I understand you correctly. But the question is:

Use the method presented in the previous slide to predict which is the country whose capital is Ankara.

In the lecture video when we try to predict capital of Russia we go from country to capital - that is USA->DC and Russia->? - so we calculate the difference vector from country to capital ([5, -1] in that case, 5 to the right, and 1 down.).
When we try to find country from capital, we should try to estimate DC->USA and Ankara->?

So, we should try to find vector pointing TO a country we want to find (from the reference point of Ankara):
USA - Washington = [-5, 1] # (5 to the left, and 1 up)
Unknown_country - Ankara = [?, ?]

So we plug in DC->USA vector to find this country:
Unknown_country - Ankara = [-5, 1] # (what country is 5 to the left, and 1 up?)
which should be:
Unknown_country = Ankara - Washington + USA
Unknown_country = [4, 2] # here cosine similarity would be 1, but there is no country here.

So we find cosine similarity to Japan and Turkey (calculations above) and I think this is what the question asks.

Now, to your question:
What you are calculating is something different:
usa_wdc_vec = usa - wdc # [-5, 1]
jpn_ank_vec = jpn - ank # [-5, 2] (cos_sim = 0.9833)
tur_ank_vec = tur - ank # [-6, 0] (cos_sim = 0.9806)

You are searching which vector of differences is most similar to the usa-dc difference. So yeah, it is the case that “5 to left, and 2 up” is more cosine similar than “6 to the left and 0 up/down”. But this is not what the method in the slide presented and I’m not sure I understand why this method would be superior? Why would you say:

While taking USA-DC and X-Ankara, no matter where the reference point is, the result would be the same.

In particular, if you change any point (USA, DC, JPN, TUR, ANK) the results would be different in both methods?
If you are saying that you would transform the space (move [0, 0] or transform it any other way like skewing, rotating etc.) then my answer would be that the whole point of this space is to be exactly how it is and it makes a huge difference.

Jack_Changfan · March 17, 2022, 11:01pm

I see. Thanks for your explanation.

Jack_Changfan · March 17, 2022, 11:50pm

Mmm, another question about this.
Say, we have another word X(400, 200), which is far from the country names. With this cosine similarity, we would probably choose it over either Japan or Turkey since its cosine is 1, right? But if we use Ankara-X and Washington-USA, their similarity would be small (-0.785), which will exclude this false similarity, right?
Sorry, just trying to explore the pros and cons of different ways of modeling this.

arvyzukai · March 21, 2022, 11:54am

No problem - now I understand your previous question better.

To rephrase it - my main point was that you have to calculate cosine similarity from Ankara and not from countries because the question is asking ‘which is the country whose capital is Ankara’ (using the method in previous slide) and not ‘which capital is closest from Japan or Turkey or other’ .

I see that your intuition lies with Euclidean distance and from there is the confusion. So, I would ask a counter question - what about a point Z(-500, 100)? In this case the cosine similarity ‘from country to capital’ would be 1 (and I think you would find it worse than Turkey(-6, 0))

Cosine similarity is all about the vector arrow and anything that lies on it or nearest to it (no matter how far or near in Euclidean space) as explained in the previous video about ‘Food corpus’ vs ‘Agriculture corpus’ vs ‘History corpus’.

In reality, you could combine Euclidean distance with Cosine similarity to get a hybrid score. You could also use other similarity metrics:

and many more.

Cheers!

Topic		Replies	Views
Manipulating Words in Vector Spaces - Vector Space Models \| Coursera NLP with Classification and Vector Spaces week-module-3	5	352	September 12, 2022
Relation between countries and capital NLP with Classification and Vector Spaces week-module-3	6	393	January 18, 2022
C1 W3 quiz clarification NLP with Classification and Vector Spaces week-module-3	5	369	July 26, 2022
Questions about the questions in the exercise section NLP with Classification and Vector Spaces week-module-3	4	45	December 11, 2024
C1_W3_Assignment problem with simularity outputs for get_country function NLP with Classification and Vector Spaces week-module-3	4	365	January 18, 2024

About the quiz question in lecture "Manipulating Words in Vector Spaces"

Related topics