C5 W4 A1 EncoderLayer arguments for self.mha

Suhas_K_R · May 17, 2023, 8:32am

Hello,
I have spent almost a day trying to figure out the arguments for the function self.mha I am not able to make any progress.

I get the error saying “value” argument is missing.
I just am not able to figure out, what else I need to send to self.mha(…)

I can see the hints that ask me to send query, key, value and mask. But where do I get them from. I can see that the init gets (embedding_dim, num_heads, fully_connected_dim) and the call function gets (x, training, mask) as arguments. Thats it. Where do we get the query, key and value matrices from?

Edit: After going through following I was able to solve it.

Thanks,
Suhas

Suhas_K_R · May 17, 2023, 12:24pm

@TMosh
Can you please give some clue or link any documentation as to why q, k and v are the same to self.mha()?

I dont understand why q, k and v matrices are same.
In the video lecture the professor tells that q is like a question, k is like a key and v is the value for a particular word. Now how can all these three be the same??

TMosh · May 18, 2023, 12:42am

Andrew’s lecture on Self-Attention doesn’t really cover this topic. Most of that lecture is simply about the Attention method itself.

The details of self-attention don’t appear (and really without much explanation) until you get to the Transformer video at 3:12:

At item 1), you can see that the self-attention MHA uses X for Q, K, and V.
At item 2), you can see that K and V come from the Encoder, and Q comes from another self-MHA layer.

Suhas_K_R · May 18, 2023, 5:26am

Ahh I see.
Yes, from 1) we can see that mha uses X for Q, K and V.

a) But what is the intuition or reasoning behind this?
b) Is it like Q, K, V will be built during training phase?
c) Do we have back-propagation here or not during training?
(I think backpropagation should be there, but not quite sure where it fits in)

Do you know of any blogs or resources to help understand this?

Thanks for your response,
Suhas

TMosh · May 18, 2023, 6:37am

Maybe try this:
https://peterbloem.nl/blog/transformers

Topic		Replies	Views
C5 W4: Exercise 4 EncoderLayer() At least need to know Sequence Models coursera-platform	13	651	December 3, 2022
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	747	August 4, 2021
C4W2 - Encoder Layer: self.mha call arguments NLP with Attention Models week-module-2	2	311	February 21, 2024
WEEK 4 Stuck in the final transformer asignment Sequence Models week-module-4 , coursera-platform	5	168	May 12, 2024
Q,K,V all are same for self attention Sequence Models coursera-platform	5	701	November 19, 2023

C5 W4 A1 EncoderLayer arguments for self.mha

Related topics