Programming Assignment: Transformers Architecture with TensorFlow encoderlayer

maxma · January 23, 2024, 2:17am

hi,
I am confused with the input of multihead attention,
It should be Q, K, V right?
self.mha = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim,
dropout=dropout_rate)
in the programming assignment,
exercise 4
class EncoderLayer
def call(self, x, training, mask):

the input of function call only contains x,
what are the values of input for self.mha ?
i wrote self_mha_output = self.mha(x, return_attention_scores=True, training=True)

and got error like TypeError: call() missing 1 required positional argument: ‘value’
Sorry I am still not quite understand all details of encoder-decoder architecture,
can anyone help me?

thanks

lukmanaj · January 23, 2024, 2:45am

Hi @maxma ,
As a hint, self.mha() expects at least 4 arguments, one for each of Q, K and V and also the mask. In this specific case, x represents Q, K and V. This is because in self attention in the encoder, Q,K and V are the same.

TMosh · January 23, 2024, 8:22am

So, just to be clear, you have to use ‘x’ three times.

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_v1_Encoder Sequence Models coursera-platform	2	580	August 25, 2021
C5_W4_A1_Ex-4_EncoderLayer Sequence Models coursera-platform	16	3025	September 13, 2021
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	757	August 4, 2021
C5 W4 A1 EncoderLayer arguments for self.mha Sequence Models coursera-platform	4	628	May 18, 2023
C5 W4: Exercise 4 EncoderLayer() At least need to know Sequence Models coursera-platform	13	668	December 3, 2022

Programming Assignment: Transformers Architecture with TensorFlow encoderlayer

Related topics