Question on Transformers

Boris_Martinez · September 30, 2022, 11:26pm

Hello
The transformers assignment states at some point that " Remember that to compute self-attention Q, V and K should be the same". I don´t understand why. The lectures do not mention anything about making Q,V and K the same when computing self-attention
Thanks for the clarification
Regards,
Boris M.

TMosh · September 30, 2022, 11:57pm

The lectures on this topic are rather incomplete. We’ve requested some updates, they’re not available yet.

By definition, “self-attention” means you use exactly the same data for K, Q, and V.

It’s not well-explained in the lectures.

Tal_N · July 15, 2023, 11:42pm

Hi,

It’s been several months since this question was raised here, but I still have the same question (not sure if anything in the lectures was updated to cover that). So, why do K,Q, and V need to be the same for self-attention? How does this make sense to have three matrixes but have them be the same (i.e. isn’t it just a waste of parameters)?

Thanks,
Tal

Tal_N · July 16, 2023, 1:02am

Relatedly, I would also like to understand why in the decoder in the same assignment (week 4 assignment 1) k and v are the same (both are enc_output).
It would be great if someone can provide a general explanation on when q,k,v are the same, and when they would all be distinct from each other. I have seen several questions about this in the forum, but no answers that actually clarify this…
Thanks!

Topic		Replies	Views
Course 5 - Week 4 - A1 - Exercise 4 - EncoderLayer Sequence Models week-4	2	39	August 13, 2024
Self-attention in the Transformer Network Sequence Models week-4	7	78	August 15, 2024
Self attention and redundancy NLP with Attention Models week-2	2	601	March 23, 2023
Q,K,V all are same for self attention Sequence Models	5	649	November 19, 2023
[Week 4] - Lab - Self Attention Sequence Models	1	625	June 4, 2021

Question on Transformers

Related topics