Transformer Pre-processing Max Sequence Length

Yuriy · September 12, 2021, 6:56pm

In the optional Transformer Preprocessing Lab, I came across this:

Define the embedding dimension as 100. This value must match the dimensionality of the word embedding. In the “Attention is All You Need” paper, embedding sizes range from 100 to 1024, depending on the task. The authors also use a maximum sequence length ranging from 40 to 512 depending on the task. Define the maximum sequence length to be 100, and the maximum number of words to be 64.

Question: what is the difference between maximum sequence length and the maximum number of words? Isn’t sequence made up of words (so if the maximum number of words is 64, wouldn’t the maximum sequence also be 64)? Why define both?

edwardyu · September 13, 2021, 8:05am

Here, “maximum number of words” is the maximum number of words retained when building the tokenizer, other words will be ignored. In other words, the tokenizer only recognizes these words.

Topic		Replies	Views
Week 4 : 10 words are vectorized to only 9 numbers? Sequence Models	1	513	June 15, 2022
SentenceTransformers embedding tool has a maximum context length of 256 tokens, not characters, right? Advanced Retrieval for AI with Chroma	1	180	January 5, 2024
W4A1 target_sequence_length? Sequence Models week-4	1	9	March 11, 2025
How to to determine max_length? Natural Language Processing in TensorFlow week-2 , week-3 , week-4	3	773	April 6, 2022
Can someone help me understand this explanation from week2 in the emojify workbook Sequence Models general	1	64	June 14, 2024

Transformer Pre-processing Max Sequence Length

Related topics