C4W2-Assignment Block 1 of DecoderLayer

I have a question regarding the Block 1 of DecoderLayer.

The hint suggested that only the look-ahead mask is needed for block 1 and I did pass all the unit tests by only passing the look-ahead mask.

However, I wonder why the padding mask is not needed in the self-attention layer of a decoder. Isn’t padding mask always needed in order to guarantee that all sequences within a batch have the same length?
However, when I tried the following code, I got a couple of failed unit tests for test_transformer.

combined_mask = tf.maximum(padding_mask, look_ahead_mask[:, tf.newaxis, :])
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x, combined_mask, return_attention_scores=True)

don’t use any hard code method of merging masking.

the reason block1 uses only look ahead mask is to make neural network to ignore the future values on the decoder input(target input).

Kindly share screenshot of the failed test without sharing any part of grade function codes.