Dec_padding_mask

blackdragon · February 3, 2024, 6:29am

For exercise 5 (next_word function), why can’t we pass in the output into the create_padding_mask function when defining dec_padding_mask? Isn’t “output” here the decoder input which is what is supposed to be sent?

arvyzukai · February 6, 2024, 6:01pm

Hi @blackdragon

The name dec_padding_mask might have mislead you. The dec_padding_mask is used in the second Multi-Head attention (Cross-Attention):

It is used to let the decoder know which encoder inputs where padding.
For example, if the English sentence is “I love learning <pad> <pad> <pad> <pad> <pad> <pad>” the Encoder encodes this sequence with 8 tokens long (1, 8, n_units) (batch_size, seq_length, feature_size).
So, this mask informs the decoder that tokens starting with 4 are padding tokens and it should not pay any attention to them when generating the German sentence.

Cheers

Topic		Replies	Views
C5 W4 - Mistake in Transformer dec_padding_mask Sequence Models week-4 , coursera-platform	2	192	March 9, 2024
Clarification on dec_padding_mask Sequence Models coursera-platform	1	546	April 6, 2022
Please explain the comment: Notice that both encoder and decoder padding masks are equal NLP with Attention Models week-2	2	300	March 4, 2024
Dls course 5 week 4 exercise transformer final Sequence Models coursera-platform	2	530	May 15, 2023
Video: NMT with Attention NLP with Attention Models week-1	1	598	May 20, 2022

Dec_padding_mask

Related topics