Machine Translation Transformer keeps predicting same outputs repeatedly

gursi26 · August 16, 2023, 2:48pm

I created and tried to train my own machine translation transformer model with the regular encoder decoder architecture, attempting to keep it as close to the original “Attention is all you need” paper.

The problem is that my model seems to output the same thing repeated without generating an token until many timesteps later. The outputs look something like this for an english to spanish translation.

>>> inference("i like to swim", model, dataset, DEV, 50)
'<SOS> me gusta nadar a nadar te gusta nadar a nadar a nadar les gusta nadar a me gusta nadar como me gusta nadar <EOS>'

The correct output should just be ‘me gusta nadar’, which the model has generated and then gone past, repeated previous outputs again and again.

What could be the reason for behaviour like this? For context, I trained the model for 20 epochs on a dataset with 130,000 translations. Do i keep training for longer?

gent.spah · August 16, 2023, 5:21pm

I believe in this case you have to limit the output, as far as I understand the model keep outputing the next probable word, if you limit its output tokens should give what you expect right!

gursi26 · August 16, 2023, 11:45pm

Yes, but how can I hardcode an output limit when the number of output words may vary by the input sequence? I thought the point of the EOS token was for the model to recognize when the translation was complete?

gent.spah · August 17, 2023, 6:07am

I am not familiar with this right now, maybe another mentor can give a better explanation.

gursi26 · August 17, 2023, 10:21am

Update for anyone else who runs into the same problem:

The model does in fact need to be trained for more epochs. The sequences generated by the model become shorter and less repetitive and it starts generating tokens after more training.

A larger dataset would also help the model to generalise better, since short sequences like these were not present in the dataset I was training on.

Topic		Replies	Views
C4W1: EOS token has very low probability NLP with Attention Models week-1	11	158	July 18, 2024
Why does the model start repeating the same sentences after some N number of token outputs? Pretraining LLMs ai-discussions	3	403	September 25, 2024
Decoder-only Transformer Training/Inference Sequence Models	3	635	June 6, 2023
Machine Translation With Attention - Practice problem Sequence Models week-3	2	28	September 18, 2024
C5 Week 4 understanding transformer forward pass Sequence Models	2	530	May 10, 2022

Machine Translation Transformer keeps predicting same outputs repeatedly

Related topics