Subtle, confusing errors in C3W1 notebook explanation of Mean

The section below in the notebook explaining the tl.Mean layer in the context of an embedding layer has some subtle mistakes. These don’t affect the output correctness (it describes correctly how Mean works) but it is wrong in how it is applied to word embeddings.

After applying the embedding layer created in the top cell to the batch of inputs in the second cell we do get Shape of returned array is (2, 3, 2). However, axis 1 dimension (size 3), does not represent the vocabulary size as explained in the comment # Embedding layer will return an array of shape (batch size, vocab size, d_feature). This is rather the size number of words in each set of tokenized words.

The example is poorly constructed so that the vocabulary size happens to equal the number of words in the input list (both 3). However if we increase the number of words in the input list by 1, we confirm this is the driving factor, not the vocabulary size:

This error is propagated to the explanation below the cell, (an embedding vector that is an average of all words in the vocabulary).. We are in fact trying to average across words in the input list, not in the entire vocabulary.

From the lecture notes: here is the embedding matrix with for the vocabulary:

And here is the mean layer, being applied to calculate the mean word for the input sentence:

The next section also has very confusing errors:

Pretend the embedding matrix uses 2 elements for embedding the meaning of a word and has a vocabulary size of 3. So it has shape (2,3)

The embedding matrix in Trax has a shape (vocab_size, embedding_size), so this should be (3,2).

# take the mean along axis 0
The same mistake from earlier is repeated here: neither in the lecture nor the Exersize 5 - classifier below are we taking the mean of the embedding matrix directly. In stead we will take the mean of the output of the embedding layer. The output of the embedding layer does not have any dimension that corresponds the the vocabulary size. This is really confusing.

Hi @Izak_van_Zyl_Marais

Thank you for spotting this mistake and your detailed explanation.

This is true and the code comment should be # Embedding layer will return an array of shape (batch size, **seq length**, d_feature)

Agreed. For the sake of clearness the vocab size should have not matched the sequence length.

This might have been copied or it could be a mistake but the text in the parenthesis is wrong as you correctly spotted. It should have been ..average of all words in the **sequence**

This could be not considered as “error” because we are asked to “pretend” :slight_smile: . It is not inline with the course lecture or the usual use of the Embedding layer but factually it is not an error. Later in the course (and in real life) the dimensions are switched places (sometimes many times) for speed / efficiency or some other reason so it could be some sort of exercise for later matrix manipulations.

I agree and even more - I would argue that it should be stated explicitly for new the learners and not confuse them.

Anyways, thank you for your detailed explanation. I will submit it for fixing.

Thanks for the feedback.

Here is a small typo further on in the same notebook:

5.2 - Testing your Model on Validation Data

Now you will write test your model’s prediction accuracy on validation data.