Find_closes_word function in Lab: C1_W3_lecture_nb_02_manipulating_word_embeddings

Hi guys, I’m not sure how the find_closest_word function is working.
In particular the effect of

delta = np.sum(diff * diff, axis=1)

I thought that we would need to use cosine similarity or euclidean distance to find which is the closest word… but the calculation of delta threw me off. Any advice?

def find_closest_word(v, k = 1):
# Calculate the vector difference from each word to the input vector
diff = embedding.values - v
#print(diff.shape)
# Get the norm of each difference vector.
# It means the squared euclidean distance from each word to the input vector
delta = np.sum(diff * diff, axis=1)
#print(delta.shape, delta[0].shape)
#print(delta[0])
# Find the index of the minimun distance in the array
i = np.argmin(delta)
# Return the row name for this item
return embedding.iloc[i].name

  1. Euclidean distance is \sqrt{\sum_i {(x_i - y_i)}^2}
  2. diff represents x_i - y_i
  3. diff * diff is element wise multiplication of the differences i.e. {(x_i - y_i)}^2
  4. np.sum function results in \sum_i {(x_i - y_i)}^2
  5. Instead of using the square root to find the closest vector, you can use the square since you are comparing distances in the same scale using np.argmin.
2 Likes