Hi Mentor,
We are having couple of doubts. Please help to clarify.
1.mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True) # (batch_size, target_seq_len, d_model)
Here, why training argument not required to pass ?
- Here output shape is d_model. what does it mean ?