I assume this mask is used when computing cross entropy during training, since we make summary part =1 for the mask, cross entropy will be only computed for the summary part. But I did not find any parts of the scripts indicating this… Or was it already specified in the source code where passing training data (sentence, sentence, mask) to the model?
Yes, you are correct that is not visible in the assignment code. It is being handled inside the TrainTask
(in UNQ_C8 you pass train_gen (or train_batch_stream
) with this mask)