Decoder Layer Clarification Assignment

Anbu · May 14, 2022, 10:51am

Hi Mentor,

We are having couple of doubts. Please help to clarify.

1.mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True) # (batch_size, target_seq_len, d_model)

Here, why training argument not required to pass ?

Here output shape is d_model. what does it mean ?

reinoudbosch · May 16, 2022, 11:19pm

Hi Anbu,

The training argument is required. The output shape is (batch_size, target_seq_len, d_model). d_model here refers to the dimension of the extracted meaning features that serve to select the next word from the target vocabulary.

Topic		Replies	Views
Transformer assignment MultiHeadAttention call Sequence Models coursera-platform	1	534	March 21, 2022
C5 - W4 - Transformers Architecture, 3rd June 2021 version Sequence Models coursera-platform	2	744	June 9, 2021
C5_W4_A1_Transformer_Subclass_v1 # BLOCK 2 Sequence Models coursera-platform	1	692	July 18, 2021
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models coursera-platform	1	585	April 21, 2022
C5_W4_A1_UNQ_C6 Decoder Layer Sequence Models coursera-platform	4	858	August 4, 2021

Decoder Layer Clarification Assignment

Related topics