Hi Mentor,
In the Transformer Architecture, why padding mask performed in at the decoder layer ?
dec_padding_mask – Boolean mask for the second multihead attention layer
Hi Mentor,
In the Transformer Architecture, why padding mask performed in at the decoder layer ?
dec_padding_mask – Boolean mask for the second multihead attention layer
Hi Anbu,
The padding mask needs to be applied to the K and V coming from the encoder.