if you notice your output to expected output, your only results match with logit shape
so I would check the cross attention as well as the decoder call function codes.
if you notice your output to expected output, your only results match with logit shape
so I would check the cross attention as well as the decoder call function codes.