Hi @Alex_Tu
you are probably passing logits with incorrect code recall as it is looking for right shifted translation shape rather than vocab_size.
places I would check first the translator, then decoder, and if both are fine, then go back one grade cell before as 12000 is vocab size.
Check if you used the correct pre_attention or post_attention.rnn, or the call( ) in the decoder.
I am sharing a similar thread comment, see that if it helps
Let me know if any confusion.
Regards
DP