Please explain the comment: Notice that both encoder and decoder padding masks are equal

Lenny_Tevlin · March 2, 2024, 11:40pm

I do not understand the comment (train_step function in the C4W2_Assignment). Why both encoder and decoder padding masks are equal? In other words why
enc_padding_mask= dec_padding_mask = create_padding_mask(encoder_input)

arvyzukai · March 4, 2024, 6:32am

Hi @Lenny_Tevlin

It’s a good question.
In simple words - we inform the decoder to not pay attention to padding tokens of the document to be summarized.

Also note, that this is different from look_ahead_mask (causal mask) where decoder is only allowed to pay attention to itself and its previous tokens.

Check my previous explanation with an example it might add clarity.

Cheers

Lenny_Tevlin · March 4, 2024, 8:35pm

Understood, thank you

Topic		Replies	Views
Clarification on dec_padding_mask Sequence Models coursera-platform	1	546	April 6, 2022
Dec_padding_mask NLP with Attention Models week-2	1	306	February 6, 2024
#C4W2 - Exercise 4 Transformer error NLP with Attention Models week-2	7	301	February 20, 2024
Week2 Assignment - can't figure out padding_mask step. Any hints? NLP with Attention Models week-2	2	322	January 28, 2024
C4W2 Question about Decoder self-attention layer masks NLP with Sequence Models week-2	4	186	April 29, 2024

Please explain the comment: Notice that both encoder and decoder padding masks are equal

Related topics