Hi everyone!
I hope someone will be able to help me with the problem I have been facing for days with the scaled_dot_product_attention function. I have added some print statements so the shapes of multiple variables appear at the top of the output. I’ve tried a lot of things and have currently no idea how to fix this problem.
Starting with the shapes:
Q = (3,4)
K = (4,4)
V = (4,2)
This would make QK^T a (3,4) matrix
Finally, when we do another dot product with V, the shape should be (3,2)
.
@abou following @balaji.ambresh’s advice, you probably just missed the transpose to get this result.
However, I recall when I did this assignment, be mindful when you calc the scaled_attention_logits (make sure to read the instructions carefully).
And for some reason I remember finding ‘dk’ a little tricky for some reason I don’t remember now…
Thank you very much @balaji.ambresh and @Nevermnd.
But the error persists even when I use ‘transpose_b=True’
Also, why is (3,4) expected instead of (3,2) (i.e. (…, seq_len_q, depth_v)) ?
@abou Hmmm… keep in mind this is a matrix multiplication not a dot product ? And you have to use TF operations, not numpy ?
Yes I indeed I used TF for this operation, not numpy.
(3,4) is expected at the shape where codes are asserting it incorrectly.
So you need to check when the weights asserted to the q.shape[0] and k.shape[1], the shapes are not matching here, perhaps check codes on how you scaled the matmul_qk or where you add mask to the scaled tensor (missed an extra tuple??)