Dec_padding_mask

arvyzukai · February 6, 2024, 6:01pm

The name dec_padding_mask might have mislead you. The dec_padding_mask is used in the second Multi-Head attention (Cross-Attention):

It is used to let the decoder know which encoder inputs where padding.
For example, if the English sentence is “I love learning <pad> <pad> <pad> <pad> <pad> <pad>” the Encoder encodes this sequence with 8 tokens long (1, 8, n_units) (batch_size, seq_length, feature_size).
So, this mask informs the decoder that tokens starting with 4 are padding tokens and it should not pay any attention to them when generating the German sentence.

Cheers

Topic		Replies	Views
C5 W4 - Mistake in Transformer dec_padding_mask Sequence Models week-4	2	192	March 9, 2024
Clarification on dec_padding_mask Sequence Models	1	545	April 6, 2022
Please explain the comment: Notice that both encoder and decoder padding masks are equal NLP with Attention Models week-2	2	298	March 4, 2024
Dls course 5 week 4 exercise transformer final Sequence Models	2	526	May 15, 2023
Video: NMT with Attention NLP with Attention Models week-1	1	597	May 20, 2022

Dec_padding_mask

Related topics