Question for the vector representation

arvyzukai · April 27, 2023, 5:06am

I explained a simple example of Embedding weights here.

If you look at the code, trax uses RandomNormalInitializer(1.0) which is just a Random Normal distribution (red curve).
(Side note, don’t worry if you do not understand: I’m not sure you would want to know more details now about weight initialization and their sizes but in short, here 1.0 means that by default random Normal distribution is multiplied by 1 - so not changed, but when you have big models you might want to initialize with smaller weights).

If you ask about the start (when the model is initialized for the first time - before training (or “seeing” any example), the embedding table is just random numbers with Normal distribution.

Now, when we train the model, we have chosen its architecture (or in other words, made design choices) - how are we going to make predictions?

If we decided that word(token) order does not matter, then the approach is called Bag of Words. If we care about the words around the word(token) then the approach is called Continuous Bag of Words (but the order still could not matter, for example set([“I”, “love”, “learning”]) could be the “context” and the “target” could be [“NLP”]).

So, if we decided this is the way to go, then yes, the Embedding table would be updated in accordance if the model correctly predicts [“NLP”] when the inputs are set([“love”, “I”, “learning”]). But if we would have chosen other path (how we provide inputs, RNN or Transformer etc.) then the Embedding table would be updated in accordance how the model is able to predict those outcomes.

Topic		Replies	Views
How does trax word embedding layer work? NLP with Sequence Models week-1	5	748	July 29, 2023
About word embeddings in the CBOW model NLP with Probabilistic Models week-4	1	519	December 1, 2022
C3 W1 Assignment Model intuition NLP with Sequence Models week-1	1	507	December 29, 2022
Intuition behind using the weights of a CBOW model as word embeddings NLP with Probabilistic Models week-4	2	573	September 5, 2023
How are word embedding calculated end to end NLP with Sequence Models week-1	6	599	January 10, 2023

Question for the vector representation

Related topics