[Week 4] - Lab - Self Attention

AfoDubhashi · June 4, 2021, 1:53am

Hi,

I’m not quite able to understand how the dimensions of q, k and v are decided. From my understanding q and k must have the same dimensions as x however this doesn’t seem to be the case. Any insights would be appreciated.

Cheers!

edwardyu · June 4, 2021, 4:46am

I assume what you mean about X is the input sequence after embedding, in other words, each X<i> is an embedding vector. Or it’s the output sequence of previous layer if attention layer is in the middle.

The dimension of Q is same as X, but K and V depend on what it’s going to pay attention. For self-attention, Q, K, V and X have the same dimensions, because it attends to itself, e.g., encoder. However, if your network attends to another network, like part of decoder, the dimensions of K and V are same as another network.

Just like Andrew mentioned in the lecture, there are analogies between RNN attention and transformer attention:

From the picture, q is similar to t, and k is similar to t', they are not necessary to have the same dimensions.
BTW, another way to think of the picture is that alpha is probability (weights) distribution, and A is just the weighted sum of v.

Topic		Replies	Views
Q,K,V all are same for self attention Sequence Models coursera-platform	5	669	November 19, 2023
WEEK 4 Stuck in the final transformer asignment Sequence Models week-module-4 , coursera-platform	5	150	May 12, 2024
C5-W4-A1 Understanding dimensions in the scaled-dot-product-attention Sequence Models coursera-platform	2	589	March 23, 2023
Question on Transformers Sequence Models coursera-platform	3	531	July 16, 2023
Self-attention in the Transformer Network Sequence Models week-module-4 , coursera-platform	7	85	August 15, 2024

[Week 4] - Lab - Self Attention

Related topics