Relation between countries and capital

In a question asked in between video of Manipulating Words in Vector Space from USA(5,6) to Washington(10,5) which country has capital Ankara(9,1) I tried to solve it in following way

  1. First I found vector difference between USA and Washington (Washington - USA) is [5 -1]
  2. Then I found difference between Ankara(9,1) and resultant vector of step 1 ( [9 1] - [5 -1] ) which is [4,2]
  3. Then out of three countries Japan(4,3) Russia(5,5) and Turkey(3,1). The distance of [4,2] with each of them is 1, square_root(10), square_root(2). So the smallest is 1 and country is Japan .

So how the answer given is Turkey with distance square_root(2) = 1.414 ?

If you look at the vectors of the countries; you would find that these are not normalized.
In such scenarios, the cosine similarity should be considered instead of euclidean distance. Based on that, you will find that Turkey has the highest cosine similarity.

Hi, mohit.

vinm007 answered correctly, but just for clarification - the question is made out of two parts:

  1. What country? - (for that use cosine similarity: Turkey 0.98 vs Japan 0.88)
  2. What distance? - (Euclidean distance: Turkey 1.41 vs Japan 1)

So the right combination is Turkey with d=1.41 because this is the method that was talked about in previous couple videos.

I found cosine similarity for
Turkey=0.9899
Japan=0.9838

My cosine similarity for Japan is different than yours. Please can you check it

My previous answer has a slight mistake. My calculations:

# Cosine: Ankara (9, 1); Turkey(3, 1) = 0.98
(9*3 + 1*1) / (np.sqrt(9**2 + 1**2) * np.sqrt(3**2 + 1**2))

# Cosine: Ankara (9, 1); Japan(4, 3) = 0.86
(9*4 + 1*3) / (np.sqrt(9**2 + 1**2) * np.sqrt(4**2 + 3**2))

I have one more question regarding how you solved it. I tried to solve programming assignment of this week, in it the approach to find which country has the capital of Ankara(9,1) they used following approach
Step 1: country = Ankara(9,1) - Washington(10,5) + USA(5,6)
country = (4,2)
Step 2 : They compared cosine similarity between country(4,2) and Japan(4,3), Russia(5,5) and Turkey(3,1).

Whichever maximises the cosine similarity between them is answer.

So my question is that in programming assignment they used country(4,2) to compare cosine similarity with other countries but you have used directly Ankara(9,1) to compare cosine similarity with Japan, Turkey and Russia. So you have not used Step 1 in your solution. Please can you clarify more on why you didn’t used vector from Step 1 to compare cosine similarity?

I’m sorry. You’re right - I made a big mistake :flushed: not a “slight” mistake :blush:

  # Cosine similarity: country (4, 2); Turkey(3, 1) = 0.9899
  (4*3 + 2*1) / ((4**2 + 2**2)**0.5 * (3**2 + 1**2)**0.5)

  # Cosine similarity: country (4, 2); Japan(4, 3) = 0.9839
  (4*4 + 2*3) / ((4**2 + 2**2)**0.5 * (4**2 + 3**2)**0.5)

Your initial calculations were correct. I’m sorry about that.

1 Like