The section below in the notebook explaining the tl.Mean
layer in the context of an embedding layer has some subtle mistakes. These don’t affect the output correctness (it describes correctly how Mean
works) but it is wrong in how it is applied to word embeddings.
After applying the embedding layer created in the top cell to the batch of inputs in the second cell we do get Shape of returned array is (2, 3, 2)
. However, axis 1 dimension (size 3), does not represent the vocabulary size as explained in the comment # Embedding layer will return an array of shape (batch size, vocab size, d_feature)
. This is rather the size number of words in each set of tokenized words.
The example is poorly constructed so that the vocabulary size happens to equal the number of words in the input list (both 3). However if we increase the number of words in the input list by 1, we confirm this is the driving factor, not the vocabulary size:
This error is propagated to the explanation below the cell, (an embedding vector that is an average of all words in the vocabulary).
. We are in fact trying to average across words in the input list, not in the entire vocabulary.
From the lecture notes: here is the embedding matrix with for the vocabulary:
And here is the mean layer, being applied to calculate the mean word for the input sentence:
The next section also has very confusing errors:
Pretend the embedding matrix uses 2 elements for embedding the meaning of a word and has a vocabulary size of 3. So it has shape (2,3)
The embedding matrix in Trax has a shape (vocab_size, embedding_size), so this should be (3,2)
.
# take the mean along axis 0
The same mistake from earlier is repeated here: neither in the lecture nor the Exersize 5 - classifier
below are we taking the mean of the embedding matrix directly. In stead we will take the mean of the output of the embedding layer. The output of the embedding layer does not have any dimension that corresponds the the vocabulary size. This is really confusing.