Q,K,V all are same for self attention

Anbu · May 8, 2022, 1:59pm

Hi Sir,

I cannot understand to compute self attention, why Q,K and V are all to be same ? what does it mean ? can anyone please help to explain ?

alvaroramajo · May 8, 2022, 2:40pm

Hi, @Anbu!

Check this post. I think we have already answered that question before. If you need some clarification after reading it, just ask and I’ll be here to help you.

Anbu · May 8, 2022, 2:57pm

Sir, sorry Im not able to understand in the post. But Im confident about architecture. can you please explain what does it means query,key and value are same ? Please Im struck kindly explain

If so query ,key and value are same then why key_dim = embedding dimension mentioned in the assignment code ?

self.mha = MultiHeadAttention(num_heads=num_heads,
key_dim=embedding_dim,
dropout=dropout_rate)

alvaroramajo · May 8, 2022, 4:05pm

It means those three tensors have the same dimensions. If you code your embeddings in, say, vectors of 1024 values, q, k and v will be vectors of 1024 values.

Anbu · May 8, 2022, 4:14pm

okay sir

Then if q=k=v= X means then why Q = W X not happening ? Where it is actually happening ?

Also why do we need apply mask in the encoder layer ?

Dong_Zhang · November 19, 2023, 3:25pm

I was also struggling by the question why “q=k=v=X” for self-attention. For me the key to understand it is: her “q” “k” and “v” refers to the parameters when calling MultiHeadAttention layer, i.e. those are the embedding of the words to be used to calculate the corresponding q, k, v, not the final value of q, k, v themselves.

Topic		Replies	Views
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	739	August 4, 2021
WEEK 4 Stuck in the final transformer asignment Sequence Models week-module-4 , coursera-platform	5	150	May 12, 2024
Question on Transformers Sequence Models coursera-platform	3	531	July 16, 2023
C5 W4 A1 EncoderLayer arguments for self.mha Sequence Models coursera-platform	4	590	May 18, 2023
[Week 4] - Lab - Self Attention Sequence Models coursera-platform	1	626	June 4, 2021

Q,K,V all are same for self attention

Related topics