I understand that the embedding matrix is actually a number of features * vocab size. So every word has a unique vector. I also understand that there are different approaches to get this matrix. What I need help with is how does this matrix fit in a nn since all I know that goes into an nn are layers with hidden units? If you could show visually that would be great
Typically the embeddings are used as the input to a dense NN.
Taken from input through the embedding layer to the NN output, this forms a cost function.
The gradients come from backpropagation through the NN and extended back into the embedding layer.
A method like gradient descent is used to learn the NN weights and the embedding weights.
This all happens behind the scenes, since TensorFLow automates the learning process.
Thanks. Some more questions on that:
The input to an embedding layer can be (is usually?) an one-hot encoded vector?
The weights of the embedding layer define the word-vector?
The weights of the embedding layer are trained based on the specific output of the NN?
Or is there a pre-trained model (eg BERT, GPT-1/2/3 etc) that provides embedding vectors for all english words for example, that everyone can use instead of training our own each time? What is advised?
I’ll answer as best I can, then we’re about out of my depth in this topic: