Course 5 Week 4 Transformer Padding Mask

So I’ve a question about the introductory text.

It shows

[“Do”, “you”, “know”, “when”, “Jane”, “is”, “going”, “to”, “visit”, “Africa”],

Being vectorised as

[ 71, 121, 4, 56, 99, 2344, 345, 1284, 15]

The first sentence has 10 words, but the vectorisation has only 9 numbers. Looking at the next rows, ‘15’ is the token for ‘Africa’ but then doesn’t that mean there is a word missing in this vector? Sorry if that is a silly question.

One of the words is not in the vocabulary.

Oh - ok - thank you that was an amazingly speedy response - not sure which timezone you’re in, but I hope it’s not intruding on your out of work hours!

Would words missing from a dictionary normally be dropped like that in a Transformer or can there be a dummy token for words that aren’t known, as there were in some of the previous models iirc?