Something about the packaging data seems off to me. I wouldn’t expect an eos_token_id to be followed directly by a bos_token_id and a new sentece. I would expect an eos_token_id to be followed by another eos_token_id ulntil the sequence length is compleated. Otherwise, it might train the model to assume that any sentence can follow any other sentence.
Have you had the chance to try this in production to see the impact of explicitly inserting like a kind of full stop, essentially filling out the sequence with eos_token_id up to the end? I’m curious because this approach seems more logical to me, but I’m unsure if it would enhance the results or not.
By the way, I really enjoyed the course—great job!