Week 4 A1 problem with scaled_dot_product_attention

abou · September 2, 2024, 11:08am

Hi everyone!
I hope someone will be able to help me with the problem I have been facing for days with the scaled_dot_product_attention function. I have added some print statements so the shapes of multiple variables appear at the top of the output. I’ve tried a lot of things and have currently no idea how to fix this problem.

balaji.ambresh · September 2, 2024, 11:22am

Starting with the shapes:
Q = (3,4)
K = (4,4)
V = (4,2)

This would make QK^T a (3,4) matrix
Finally, when we do another dot product with V, the shape should be (3,2).

Nevermnd · September 2, 2024, 12:12pm

@abou following @balaji.ambresh’s advice, you probably just missed the transpose to get this result.

However, I recall when I did this assignment, be mindful when you calc the scaled_attention_logits (make sure to read the instructions carefully).

And for some reason I remember finding ‘dk’ a little tricky for some reason I don’t remember now…

abou · September 2, 2024, 1:25pm

Thank you very much @balaji.ambresh and @Nevermnd.
But the error persists even when I use ‘transpose_b=True’
Also, why is (3,4) expected instead of (3,2) (i.e. (…, seq_len_q, depth_v)) ?

Nevermnd · September 2, 2024, 1:36pm

@abou Hmmm… keep in mind this is a matrix multiplication not a dot product ? And you have to use TF operations, not numpy ?

abou · September 6, 2024, 11:24pm

Yes I indeed I used TF for this operation, not numpy.

Deepti_Prasad · September 6, 2024, 11:58pm

(3,4) is expected at the shape where codes are asserting it incorrectly.

So you need to check when the weights asserted to the q.shape[0] and k.shape[1], the shapes are not matching here, perhaps check codes on how you scaled the matmul_qk or where you add mask to the scaled tensor (missed an extra tuple??)

Topic		Replies	Views
Week 4 Scaled Dot Product Attention Sequence Models coursera-platform	10	806	October 31, 2021
Scaled_dot_product_attention q, k, and v dimensions not correct Sequence Models coursera-platform	4	452	July 21, 2023
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-module-4 , coursera-platform	3	268	February 5, 2024
C5 W4 Lab1 E3 Sequence Models week-module-4 , coursera-platform	8	20	March 25, 2025
Week 4: scaled_dot_product_attention Sequence Models coursera-platform	3	904	August 5, 2021

Week 4 A1 problem with scaled_dot_product_attention

Related topics