In Week 4 Assignment (Transformer Network), section 2.1 of instructions says:
2.1 - Padding Mask
Oftentimes your input sequence will exceed the maximum length of a sequence your network can process. Let's say the maximum length of your model is five, it is fed the following sequences:
[["Do", "you", "know", "when", "Jane", "is", "going", "to", "visit", "Africa"],
["Jane", "visits", "Africa", "in", "September" ],
["Exciting", "!"]
]
which might get vectorized as:
[[ 71, 121, 4, 56, 99, 2344, 345, 1284, 15],
[ 56, 1285, 15, 181, 545],
[ 87, 600]
How come the sequence “Do you know when Jane is going to visit Africa” has 10 words, but it is vectorized to only 9 numbers [ 71, 121, 4, 56, 99, 2344, 345, 1284, 15] ?