Why columns of W1 matrix correpond to the words at the correspoding index in V(vocabulary)

God_of_Calamity · April 29, 2024, 10:40am

In one the lectures the professor says, that the weights at column i of W1 correspond to the embeddings of the words at the ith row in the vocabulary.

I want to ask why is this so, is it only because the of the dimensions like W1 has shape (N, V) so each column must correspond to a word, but how can we say that it will only belong to the corresponding word at the corresponding index in the vocabulary and not to any other word in the vocab, is there any intuitive explanation for this?

arvyzukai · April 29, 2024, 1:38pm

Hi @God_of_Calamity

The intuitive explanation could be that the word is associated with some number and this pairing never changes. For example, the word “cat” can be represented by number “23” and this would never change. Whenever we encounter the word “cat” we know that it is number 23, and whenever we deal with the number 23, we know that it’s a “cat”.

The weight matrix also never changes its dimensions, so whenever the prediction is right or wrong, we modify only the words that are present in a sentence (we backprop to “cat” only when the 23th line was involved in the calculations).

Cheers

God_of_Calamity · April 29, 2024, 4:28pm

like do you mean to say that y_true=“cat” and we made a wrong prediction say y_pred=“dog”, then during backprop only the column belonging to the “cat” i.e columns at index 23 would have their parameters modified.

arvyzukai · April 30, 2024, 6:27am

I’m not sure you could phrase it this way. What I mean, that it’s not about prediction, it’s more about the way the input is constructed.

If you predicted “cat” or “dog” correctly it changes whether you increase or decrease weights involved in calculations.

But your original questions is:

In other words, if “cat” is in the input, then its weights would be changed (increased or decreased). If “cat” was not in the input, then its weights would not be changed (neither decreased nor increased). In other words, there might be thousands of words/rows in a vocabulary, but the ones that are modified are only the ones that are in the mini-batch for predictions.

“cat” is always the same row (number 23) - it never changes its position. The weights in this row are only modified when they are “involved” in predicting.

Cheers

Topic		Replies	Views
Making sense of W2 as embedding matrix NLP with Probabilistic Models week-module-4	2	499	June 2, 2023
Embedding Layer Transfer Learning Natural Language Processing in TensorFlow	1	311	November 29, 2022
Why same Wax , Way ,Waa for each step Sequence Models coursera-platform	2	524	December 30, 2021
Week 3 (W3) - W matrix values Neural Networks and Deep Learning week-module-3 , coursera-platform	5	14	May 16, 2025
Why multiply one hot encoder with word embedding matrix when we can just extract the column? Sequence Models coursera-platform	2	558	December 18, 2024

Why columns of W1 matrix correpond to the words at the correspoding index in V(vocabulary)

Related topics