In a question asked in between video of Manipulating Words in Vector Space from USA(5,6) to Washington(10,5) which country has capital Ankara(9,1) I tried to solve it in following way
First I found vector difference between USA and Washington (Washington - USA) is [5 -1]
Then I found difference between Ankara(9,1) and resultant vector of step 1 ( [9 1] - [5 -1] ) which is [4,2]
Then out of three countries Japan(4,3) Russia(5,5) and Turkey(3,1). The distance of [4,2] with each of them is 1, square_root(10), square_root(2). So the smallest is 1 and country is Japan .
So how the answer given is Turkey with distance square_root(2) = 1.414 ?
If you look at the vectors of the countries; you would find that these are not normalized.
In such scenarios, the cosine similarity should be considered instead of euclidean distance. Based on that, you will find that Turkey has the highest cosine similarity.
I have one more question regarding how you solved it. I tried to solve programming assignment of this week, in it the approach to find which country has the capital of Ankara(9,1) they used following approach
Step 1: country = Ankara(9,1) - Washington(10,5) + USA(5,6)
country = (4,2)
Step 2 : They compared cosine similarity between country(4,2) and Japan(4,3), Russia(5,5) and Turkey(3,1).
Whichever maximises the cosine similarity between them is answer.
So my question is that in programming assignment they used country(4,2) to compare cosine similarity with other countries but you have used directly Ankara(9,1) to compare cosine similarity with Japan, Turkey and Russia. So you have not used Step 1 in your solution. Please can you clarify more on why you didn’t used vector from Step 1 to compare cosine similarity?