Decoder Layer Clarification Assignment

Hi Mentor,

We are having couple of doubts. Please help to clarify.

1.mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True) # (batch_size, target_seq_len, d_model)

Here, why training argument not required to pass ?

  1. Here output shape is d_model. what does it mean ?

Hi Anbu,

The training argument is required. The output shape is (batch_size, target_seq_len, d_model). d_model here refers to the dimension of the extracted meaning features that serve to select the next word from the target vocabulary.