Why columns of W1 matrix correpond to the words at the correspoding index in V(vocabulary)


In one the lectures the professor says, that the weights at column i of W1 correspond to the embeddings of the words at the ith row in the vocabulary.

I want to ask why is this so, is it only because the of the dimensions like W1 has shape (N, V) so each column must correspond to a word, but how can we say that it will only belong to the corresponding word at the corresponding index in the vocabulary and not to any other word in the vocab, is there any intuitive explanation for this?

Hi @God_of_Calamity

The intuitive explanation could be that the word is associated with some number and this pairing never changes. For example, the word “cat” can be represented by number “23” and this would never change. Whenever we encounter the word “cat” we know that it is number 23, and whenever we deal with the number 23, we know that it’s a “cat”.

The weight matrix also never changes its dimensions, so whenever the prediction is right or wrong, we modify only the words that are present in a sentence (we backprop to “cat” only when the 23th line was involved in the calculations).

Cheers

like do you mean to say that y_true=“cat” and we made a wrong prediction say y_pred=“dog”, then during backprop only the column belonging to the “cat” i.e columns at index 23 would have their parameters modified.

I’m not sure you could phrase it this way. What I mean, that it’s not about prediction, it’s more about the way the input is constructed.

If you predicted “cat” or “dog” correctly it changes whether you increase or decrease weights involved in calculations.

But your original questions is:

In other words, if “cat” is in the input, then its weights would be changed (increased or decreased). If “cat” was not in the input, then its weights would not be changed (neither decreased nor increased). In other words, there might be thousands of words/rows in a vocabulary, but the ones that are modified are only the ones that are in the mini-batch for predictions.

“cat” is always the same row (number 23) - it never changes its position. The weights in this row are only modified when they are “involved” in predicting.

Cheers

1 Like