Here’s a paraphrased version:
In Lesson 3, when data was being packed, the input IDs were divided based on the maximum sequence length. However, I don’t think this is an ideal approach. Dividing sentences like this can combine two independent and unrelated sentences into a single input. As a result, the model may attempt to predict one sentence based on earlier text that has no connection to it.
Is there a better approach for handling this?