04_Data_preparation_lab_student - tokenize_function

Once I tokenize the dataset , shouldn’t all the samples have the same length ? However in the tokenize_function, I noticed that , in each of the passed example we are setting the max_length based on the minimum of the example or 2048. So if we have two examples of length 100 and 150 , they’ll have different length after the function processes them.

max_length = min(
        tokenized_inputs["input_ids"].shape[1],
        2048
    )

Please help
Thanks
Arindam Dey