Week2 Assignment - can't figure out padding_mask step. Any hints?

I’m really struggling to figure out how to clear this error. I understand the need for padding_mask but I don’t know where in the routine to apply it.

GRADED FUNCTION: DecoderLayer…

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.2026718 -1.5374368 0.4260614 -0.09129634]

1 Like

Hi @julianhatwell

The padding mask for the Decoder should be applied in the DecoderLayer itself (Exercise 2), after this code comment:

        # BLOCK 2
        # calculate self-attention using the Q from the first block and K and V from the encoder output. 
        # Dropout will be applied during training
        # Return attention scores as attn_weights_block2 (~1 line) 

and also in the Decoder class (Exercise 3), after this code comment:

            # pass x and the encoder output through a stack of decoder layers and save the attention weights
            # of block 1 and 2 (~1 line)

In other words, to pass this unit test, you need the padding mask in these two places. The padding mask is used to inform the Decoder layers’ Cross Attention which Encoder’s input tokens were padded (in order for the Decoder better decide which Encoder inputs are relevant).

Cheers

OK, I have passing unit tests now. Thank you. I was over-thinking the process. It’s the simplest possible step given the information in your answer. Thanks.

1 Like