Hi guys, I’m not sure how the find_closest_word function is working.
In particular the effect of
delta = np.sum(diff * diff, axis=1)
I thought that we would need to use cosine similarity or euclidean distance to find which is the closest word… but the calculation of delta threw me off. Any advice?
def find_closest_word(v, k = 1):
# Calculate the vector difference from each word to the input vector
diff = embedding.values - v
#print(diff.shape)
# Get the norm of each difference vector.
# It means the squared euclidean distance from each word to the input vector
delta = np.sum(diff * diff, axis=1)
#print(delta.shape, delta[0].shape)
#print(delta[0])
# Find the index of the minimun distance in the array
i = np.argmin(delta)
# Return the row name for this item
return embedding.iloc[i].name