Why multiply one hot encoder with word embedding matrix when we can just extract the column?

This lecture tells about the uses of word embedding in our projects. My question is if we need the column of a word why do we matrix multiply with the words one hot encoding why not just extract the column. As we specifically know which column is going to be the output?
Why do unnecessary computation?

1 Like

The one-hot coding in the vocabulary is the traditional method of identifying a word without using its index value specifically. Using the index value would lead to unintended and misleading similarity between words. For example, if the vocabulary is in alphabetic order, then “ape”, “apostrophe” and “apple”, would appear to have similar numerical meanings, even though they are very different.

I have similar question and still haven’t quite got it.

Doesn’t the model only focus on the output instead of the indices? As a simple example, I think this is analogous to declaring a simple list/matrix lst, where lst[0] = [5, 6, 7] and lst[1] = [8, 9, 0]. When use direct indexing, 0 and 1 will look similar yes, but the extracted values [5, 6, 7] and [8, 9, 0] are completely different, and the code works with inner values, not the index positions.

Given that the embedding matrix is learnable, the indexing operation (or the multiplication of one-hot encoding with the matrix) extracts 1 specific column out of it and the one-hot encoding is fixed across training session, I don’t see any reason why direct indexing would cause misleading similarity.

Could you please elaborate more on this point or give some other reasons for the original question? Thank you in advance.