Please explain the comment: Notice that both encoder and decoder padding masks are equal

arvyzukai · March 4, 2024, 6:32am

It’s a good question.
In simple words - we inform the decoder to not pay attention to padding tokens of the document to be summarized.

Also note, that this is different from look_ahead_mask (causal mask) where decoder is only allowed to pay attention to itself and its previous tokens.

Check my previous explanation with an example it might add clarity.

Cheers

Topic		Replies	Views
Clarification on dec_padding_mask Sequence Models coursera-platform	1	546	April 6, 2022
Dec_padding_mask NLP with Attention Models week-2	1	306	February 6, 2024
#C4W2 - Exercise 4 Transformer error NLP with Attention Models week-2	7	301	February 20, 2024
Week2 Assignment - can't figure out padding_mask step. Any hints? NLP with Attention Models week-2	2	322	January 28, 2024
C4W2 Question about Decoder self-attention layer masks NLP with Sequence Models week-2	4	186	April 29, 2024

Please explain the comment: Notice that both encoder and decoder padding masks are equal

Related topics