Machine Translation Transformer keeps predicting same outputs repeatedly

I created and tried to train my own machine translation transformer model with the regular encoder decoder architecture, attempting to keep it as close to the original “Attention is all you need” paper.

The problem is that my model seems to output the same thing repeated without generating an token until many timesteps later. The outputs look something like this for an english to spanish translation.

>>> inference("i like to swim", model, dataset, DEV, 50)
'<SOS> me gusta nadar a nadar te gusta nadar a nadar a nadar les gusta nadar a me gusta nadar como me gusta nadar <EOS>'

The correct output should just be ‘me gusta nadar’, which the model has generated and then gone past, repeated previous outputs again and again.

What could be the reason for behaviour like this? For context, I trained the model for 20 epochs on a dataset with 130,000 translations. Do i keep training for longer?

I believe in this case you have to limit the output, as far as I understand the model keep outputing the next probable word, if you limit its output tokens should give what you expect right!

Yes, but how can I hardcode an output limit when the number of output words may vary by the input sequence? I thought the point of the EOS token was for the model to recognize when the translation was complete?

I am not familiar with this right now, maybe another mentor can give a better explanation.

Update for anyone else who runs into the same problem:

The model does in fact need to be trained for more epochs. The sequences generated by the model become shorter and less repetitive and it starts generating tokens after more training.

A larger dataset would also help the model to generalise better, since short sequences like these were not present in the dataset I was training on.