Please explain the comment: Notice that both encoder and decoder padding masks are equal

I do not understand the comment (train_step function in the C4W2_Assignment). Why both encoder and decoder padding masks are equal? In other words why
enc_padding_mask= dec_padding_mask = create_padding_mask(encoder_input)

Hi @Lenny_Tevlin

It’s a good question.
In simple words - we inform the decoder to not pay attention to padding tokens of the document to be summarized.

Also note, that this is different from look_ahead_mask (causal mask) where decoder is only allowed to pay attention to itself and its previous tokens.

Check my previous explanation with an example it might add clarity.

Cheers

1 Like

Understood, thank you

1 Like