Scaled_dot_product_attention q, k, and v dimensions not correct

cjcabit · July 20, 2023, 6:54pm

I’m not sure the scaled_dot_product_attention function can be done correctly.
q, k and v must have matching leading dimensions and k and v must have matching penultimate dimensions. That’s the only way to complete this exercise. Well, they simply don’t.
q shape (3, 4)
k shape (4, 4)
v shape (4, 2)
At no point is there any indication that I am meant to manipulate the dimensions of q, k, or v in any way other than transposing them.
qk can only be shape (3, 4) or shape (4, 3)
Therefore qk times v can only be shape (3, 2) or shape (2, 3)
The required output shape is (3, 4)
On all previous functions, I passed. Where’s my mistake? I’m really not sure what I’m missing.

TMosh · July 20, 2023, 7:14pm

You’ve found an error in the doctext. None of that is true. I’ll submit a ticket to fix the issue.

If you look at the figure in the instructions, you need to compute Q times the transpose of K.

cjcabit · July 20, 2023, 7:22pm

tf.matmul(q, tf.transpose(k)) Ends up being shape (3, 4) and v remains shape (4, 2).
Meaning the only output I can get is (3, 2) or the transpose of that. The output needs to be shape (3, 4). I’m still confused as to what mistake I am making.

TMosh · July 20, 2023, 7:35pm

Also, this is not true.

saifkhanengr · July 21, 2023, 5:29am

Please note that we do the matmul between attention_weights and v, not the matmul_qk and v. We get attention_weights after applying the softmax and then multiply the v. Below is the output of my notebook:

q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (3, 4)
output shape: (3, 2)
q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (1, 3, 4)
output shape: (1, 3, 2)
All tests passed

Topic		Replies	Views
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4	6	66	September 6, 2024
C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention Sequence Models week-4	3	337	March 5, 2024
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-4	3	267	February 5, 2024
Relevance of shape of Query tensor Q, K and V Sequence Models	10	1145	August 22, 2023
Scaled_dot_product_attention Sequence Models	1	786	June 4, 2021

Scaled_dot_product_attention q, k, and v dimensions not correct

Related topics