Why are the diagonal components of the "look ahead mask" are 1, not 0?

In the final graded assignment of this course, we have the function create_look_ahead_mask(sequence_length).

This function returns a lower-triangular square matrix with 1s as its non-zero entries. Here, the lower-triangular portion includes the diagonal.
However, I think that the diagonal terms should be 0.
Reason: when the decoder is trying to predict the Nth word of the translation, the decoder should not see the Nth word in the target.

(Assuming the assignment is correct,) why am I wrong?

Hello @Minwoo_Kim1,

Essentially the question is about why the first look-ahead mask is [1, 0, 0, ....], instead of a list of zeros.

I think that’s because of the <SOS> token. If the model is to predict the sentence "Hello World", the actual predicted sentence should be "<SOS> Hello World" trailing with a certain number of padded zeros. So, when predicting for Hello, <SOS> is not masked out, and so you have the look-ahead mask as [1, 0, 0, ....].

Cheers,
Raymond