Course 3 Embedding Layer vs Course 2's Extracting Word Embeddings

Hi all,

At the end of course 2 during Week 4’s content we covered getting embeddings for the words/vocabulary using 1 of three options involving the W matrices. How do these differ from the embedding layers?

1 Like

Hi @LiamB

At the end of Course 2, the embeddings were trained using simple Neural Network which was not “context aware” (or in other words, using a simple Continuous bag of Words (CBOW) model). In such a case “dog ate a cat” is treated the same as “cat ate a dog”.

Course 3 introduced “context aware” embeddings (with RNNs). In that case, the embeddings are “tied” to the context (in RNN’s case, to the hidden states). So, “dog ate a cat” would not be the same as “cat ate a dog” because each embedding was trained in accordance to the hidden states.

The question is somewhat contrived.
The embedding layer is nothing more than vectors assigned to each token (a table of (vocabulary x n_features)). When you want to get some token’s embeddings, you just pick it out of this table/matrix.
As I explained, the difference is how these values came to be (how they were “trained”) and how they are used (what calculations are performed with them).

I hope that makes sense :slight_smile: