Generated sequences are too short during Machine Translation

I am creating a machine translation model that can take in an English sequence and output a Tamil one. I have been running into a big issue in where my sequences that generate recursively have been predicting the token too quickly and are very short. I am using a basic transformer model. During epoch 20 of my training, the model get an accuracy of 96% but is horrible during recursive generation. The model does work well when it gets the teacher forcing input for the decoder. This is making me think that something is wrong with teacher forcing/causal mask. I would really appreciate if someone could point out an error in my code.


1 Like