C5 W4 A1 EncoderLayer arguments for self.mha

I have spent almost a day trying to figure out the arguments for the function self.mha I am not able to make any progress.

I get the error saying “value” argument is missing.
I just am not able to figure out, what else I need to send to self.mha(…)

I can see the hints that ask me to send query, key, value and mask. But where do I get them from. I can see that the init gets (embedding_dim, num_heads, fully_connected_dim) and the call function gets (x, training, mask) as arguments. Thats it. Where do we get the query, key and value matrices from?

Edit: After going through following I was able to solve it.

  1. tf.keras.layers.MultiHeadAttention  |  TensorFlow v2.12.0
  2. نموذج المحولات لفهم اللغة  |  Text  |  TensorFlow


Can you please give some clue or link any documentation as to why q, k and v are the same to self.mha()?

I dont understand why q, k and v matrices are same.
In the video lecture the professor tells that q is like a question, k is like a key and v is the value for a particular word. Now how can all these three be the same??

Andrew’s lecture on Self-Attention doesn’t really cover this topic. Most of that lecture is simply about the Attention method itself.

The details of self-attention don’t appear (and really without much explanation) until you get to the Transformer video at 3:12:

At item 1), you can see that the self-attention MHA uses X for Q, K, and V.
At item 2), you can see that K and V come from the Encoder, and Q comes from another self-MHA layer.

Ahh I see.
Yes, from 1) we can see that mha uses X for Q, K and V.

a) But what is the intuition or reasoning behind this?
b) Is it like Q, K, V will be built during training phase?
c) Do we have back-propagation here or not during training?
(I think backpropagation should be there, but not quite sure where it fits in)

Do you know of any blogs or resources to help understand this?

Thanks for your response,

Maybe try this: