How does the embedding matrix appear in a neural network

gkouro · August 22, 2022, 10:54am

I understand that the embedding matrix is actually a number of features * vocab size. So every word has a unique vector. I also understand that there are different approaches to get this matrix. What I need help with is how does this matrix fit in a nn since all I know that goes into an nn are layers with hidden units? If you could show visually that would be great

TMosh · August 26, 2022, 2:37am

Typically the embeddings are used as the input to a dense NN.
Taken from input through the embedding layer to the NN output, this forms a cost function.

The gradients come from backpropagation through the NN and extended back into the embedding layer.

A method like gradient descent is used to learn the NN weights and the embedding weights.

This all happens behind the scenes, since TensorFLow automates the learning process.

gkouro · August 26, 2022, 10:06am

Thanks. Some more questions on that:
The input to an embedding layer can be (is usually?) an one-hot encoded vector?
The weights of the embedding layer define the word-vector?
The weights of the embedding layer are trained based on the specific output of the NN?
Or is there a pre-trained model (eg BERT, GPT-1/2/3 etc) that provides embedding vectors for all english words for example, that everyone can use instead of training our own each time? What is advised?

TMosh · August 27, 2022, 5:47am

I’ll answer as best I can, then we’re about out of my depth in this topic:

gkouro:

The input to an embedding layer can be (is usually?) an one-hot encoded vector?
Yes.

The weights of the embedding layer define the word-vector?
I"m not sure what you mean by “define the word-vector”.

The weights of the embedding layer are trained based on the specific output of the NN?
They’re trained by the gradients of the total system - not just the output NN.

Or is there a pre-trained model (eg BERT, GPT-1/2/3 etc) that provides embedding vectors for all english words for example, that everyone can use instead of training our own each time? What is advised?
Because learning the embeddings is very computationally costly (since the language set is huge), and they’re essentially constants, there is little point in everyone repeating exactly the same training exercise. So typically folks will use a standard embedding.

Topic		Replies	Views
Word embedding as input Sequence Models	1	283	December 18, 2023
Extracting Word Embedding Vectors — what do we do? NLP with Probabilistic Models week-4	2	476	July 15, 2023
Understanding Word Embeddings Sequence Models week-2	2	251	February 14, 2024
Embedding layer, why is it needed? NLP with Sequence Models week-1	4	851	September 12, 2022
Week2, emojify v_2, Embedding layer Sequence Models	2	526	April 21, 2022

How does the embedding matrix appear in a neural network

Related topics