C5 W4 - Mistake in Transformer dec_padding_mask

witalia · March 9, 2024, 1:43am

This can be traced by looking into the test - Transformer_test

dec_padding_mask is created from the output sequence:

    sentence_lang_a = np.array([[2, 1, 4, 3, 0]])
    sentence_lang_b = np.array([[3, 2, 1, 0, 0]])

    enc_padding_mask = create_padding_mask(sentence_lang_a)
    dec_padding_mask = create_padding_mask(sentence_lang_b)

It’s then used in the decoder’s cross-attention block:

    dec_output, attention_weights = self.decoder(output_sentence, enc_output, training, 
           look_ahead_mask, dec_padding_mask)

x, block1, block2 = self.dec_layers[i](x, enc_output, training, 
                                                   look_ahead_mask, padding_mask)

mult_attn_out2, attn_weights_block2 = self.mha2(
            query=Q1, key=enc_output, value=enc_output, attention_mask=padding_mask, training=training, return_attention_scores=True)

In the mha2, keys are coming from encoder, therefore, have “input seq length size”, therefore, mask should be of the appropriate size as well.

looking at the def scaled_dot_product_attention(q, k, v, mask):

 mask: Float tensor with shape broadcastable 
              to (..., seq_len_q, seq_len_k).

mask’s last dimension should be of size seq_len_k, k comes from input seq, therefore it should be input_sentence_len. Therefore - it’s a padding mask for the input sequence, not output.

Q: Am I missing something or is there indeed a mistake?

witalia · March 9, 2024, 1:48am

Easy proof:
I copied the test and added one more token to the output sequence:

It now results in an error:

TMosh · March 9, 2024, 4:02am

This is a documentation error in the notebook text. It has been reported already and will hopefully be fixed soon.

Topic		Replies	Views
Dec_padding_mask NLP with Attention Models week-2	1	306	February 6, 2024
Clarification on dec_padding_mask Sequence Models coursera-platform	1	546	April 6, 2022
Dls course 5 week 4 exercise transformer final Sequence Models coursera-platform	2	530	May 15, 2023
C5W4 Questions after finish the course Sequence Models coursera-platform	5	264	December 30, 2023
C5w4 2.1 Padding mask Sequence Models week-4 , coursera-platform	9	290	March 9, 2024

C5 W4 - Mistake in Transformer dec_padding_mask

Related topics