Please explain the comment: Notice that both encoder and decoder padding masks are equal

Hi @Lenny_Tevlin

It’s a good question.
In simple words - we inform the decoder to not pay attention to padding tokens of the document to be summarized.

Also note, that this is different from look_ahead_mask (causal mask) where decoder is only allowed to pay attention to itself and its previous tokens.

Check my previous explanation with an example it might add clarity.


1 Like