Decoder padding Mask Transformer Architecture

Hi Mentor,

In the Transformer Architecture, why padding mask performed in at the decoder layer ?

dec_padding_mask – Boolean mask for the second multihead attention layer

Hi Anbu,

The padding mask needs to be applied to the K and V coming from the encoder.