I’m not sure the scaled_dot_product_attention function can be done correctly.
q, k and v must have matching leading dimensions and k and v must have matching penultimate dimensions. That’s the only way to complete this exercise. Well, they simply don’t.
q shape (3, 4)
k shape (4, 4)
v shape (4, 2)
At no point is there any indication that I am meant to manipulate the dimensions of q, k, or v in any way other than transposing them.
qk can only be shape (3, 4) or shape (4, 3)
Therefore qk times v can only be shape (3, 2) or shape (2, 3)
The required output shape is (3, 4)
On all previous functions, I passed. Where’s my mistake? I’m really not sure what I’m missing.
You’ve found an error in the doctext. None of that is true. I’ll submit a ticket to fix the issue.
If you look at the figure in the instructions, you need to compute Q times the transpose of K.
tf.matmul(q, tf.transpose(k)) Ends up being shape (3, 4) and v remains shape (4, 2).
Meaning the only output I can get is (3, 2) or the transpose of that. The output needs to be shape (3, 4). I’m still confused as to what mistake I am making.
Also, this is not true.
Please note that we do the matmul
between attention_weights
and v
, not the matmul_qk
and v
. We get attention_weights
after applying the softmax and then multiply the v
. Below is the output of my notebook:
q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (3, 4)
output shape: (3, 2)
q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (1, 3, 4)
output shape: (1, 3, 2)
All tests passed