C3 W1 Assignment Model intuition

I am trying to understand the logic of the model architecture. The input seems to be a vector of vocab id’s for each word in the tweet (in batches). Then this vector is passed on to the embedding layer and the output is a matrix (or batch of matrices) of shape (for each member of the batch): Number of tokens x Embedding dims. This matrix (or batch of matrices) is then passed on to the Mean layer and we get a mean of each embedding column for the tokens in a tweet. So the output is 1xEmbedding dims (times batches). Then fed in a Dense layer and then softmaxed. Are these steps correct?

If yes, then is the embedding layer constructed in a way to expect vectors of id’s and return the respective embedding vector for each token?


Hey @gkouro,
Yes, these steps are correct.

  1. tweet_to_tensor converts a sentence of words into a list of IDs, where each word is mapped to it’s ID from the vocabulary
  2. data_generator uses this function for each of the selected positive and negative tweets in a batch from all the examples. It also makes sure that each sentence has a uniform length for it’s list of IDs, by appending zeros.
  3. Now, if we pass say a input of dimensionality (32, 10) to the embedding layer, it means that we have 32 samples, each represented by a list of length 10.
  4. Say the embeddings of words are represented by 100 features. So the embedding layer’s output will have a dimensionality of (32, 10, 100).
  5. This output is fed to a Mean layer, which returns a (32, 100) dimensional input. In other words, it takes the average of the word embeddings of the different words in a sentence.
  6. Then Dense, followed by Softmax

Yes, it is also mentioned in the output of help(tl.Embedding).

Trainable layer that maps discrete tokens/IDs to vectors.

I hope this helps.


1 Like