In the final graded assignment of this course, we have the function create_look_ahead_mask(sequence_length).
This function returns a lower-triangular square matrix with 1s as its non-zero entries. Here, the lower-triangular portion includes the diagonal.
However, I think that the diagonal terms should be 0.
Reason: when the decoder is trying to predict the Nth word of the translation, the decoder should not see the Nth word in the target.
(Assuming the assignment is correct,) why am I wrong?