C1_W3_lecture_nb_02_manipulating_word_embeddings in this “jupyter-notebook”, there is a function “find_closest_word” to find the ‘country’.
In the function there is a commented line
# Get the norm of each difference vector.
# It means the squared euclidean distance from each word to the input vector
delta = np.sum(diff * diff, axis=1)
But by the defination given in the slides the “equilidian distance” is calculated after taking square root of dot product.
It is creating confusion, what do you guys think.
I have tried with various other methods(by going through defination, please find below.
# Get the norm of each difference vector.
# It means the squared euclidean distance from each word to the input vector
delta = np.sum(diff * diff, axis=1)
# It means the squared euclidean distance from each word to the input vector
# here np.sqrt has been added
delta = np.sqrt(np.sum(diff * diff, axis=1))
# by defination below is the norm of a vector in one liner
delta = np.linalg.norm(diff, axis=1)
I am more concerned about the “definition” and what is written as a comment.
Yes I know result will be same with out using square-root. As I already posted code sample.
For such a example of 300 vectors used in the course I don’t think there will be huge saving in computational cost. Even this is the case it should be “mentioned” in comments.
Hey @kamlesh_karki,
Welcome, and we are glad that you could be a part of our community As Tom already explained, the results won’t differ in either of the cases; and as far as the comment goes, let me raise an issue with the team.
To keep the code as is, the only thing we would need to change is the following comment from:
Get the norm of each difference vector.
to:
Get the squared L2 norm of each difference vector.