Your error is stating to check codes for masked weight i.e.
add the mask to the scaled tensor.
The boolean mask parameter can be passed in as none
or as either padding or look-ahead.
Multiply ((1. - mask) * -1e9) before applying the softmax.
Perhaps check if you missed that extra tuple as you add to the scaled attention logits, not placing the tuple can cause error.
Regards
DP