C4W2-Assignment Block 1 of DecoderLayer

Shuo21 · April 8, 2025, 11:31pm

I have a question regarding the Block 1 of DecoderLayer.

The hint suggested that only the look-ahead mask is needed for block 1 and I did pass all the unit tests by only passing the look-ahead mask.

However, I wonder why the padding mask is not needed in the self-attention layer of a decoder. Isn’t padding mask always needed in order to guarantee that all sequences within a batch have the same length?
However, when I tried the following code, I got a couple of failed unit tests for test_transformer.

combined_mask = tf.maximum(padding_mask, look_ahead_mask[:, tf.newaxis, :])
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, combined_mask, return_attention_scores=True)

Deepti_Prasad · April 9, 2025, 9:01am

don’t use any hard code method of merging masking.

the reason block1 uses only look ahead mask is to make neural network to ignore the future values on the decoder input(target input).

Kindly share screenshot of the failed test without sharing any part of grade function codes.

Topic		Replies	Views
Week2 Assignment - can't figure out padding_mask step. Any hints? NLP with Attention Models week-module-2	2	322	January 28, 2024
C4W2 Question about Decoder self-attention layer masks NLP with Sequence Models week-module-2	4	187	April 29, 2024
C5_W4_A1_Transformer_Subclass_v1: Why is look_ahead mask used before the padding_mask Sequence Models coursera-platform	6	589	November 1, 2023
C4CW2 Failing Test Cases for Exercise 3 - Decoder NLP with Attention Models week-module-2	2	418	January 8, 2024
C5_W4_A1_Transformer_Subclass_v1 DecoderLayer Class why not "use_causal_mask" Sequence Models week-module-4 , coursera-platform	1	41	July 18, 2024

C4W2-Assignment Block 1 of DecoderLayer

Related topics