The first sentence has 10 words, but the vectorisation has only 9 numbers. Looking at the next rows, ‘15’ is the token for ‘Africa’ but then doesn’t that mean there is a word missing in this vector? Sorry if that is a silly question.
Oh - ok - thank you that was an amazingly speedy response - not sure which timezone you’re in, but I hope it’s not intruding on your out of work hours!
Would words missing from a dictionary normally be dropped like that in a Transformer or can there be a dummy token for words that aren’t known, as there were in some of the previous models iirc?