C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention

Kejun_Zhang · March 4, 2024, 4:41pm

I got this error message as below. I also print out the dimension of q, k , v. The equation for attention matrics is softmax(q*K.T) and dot product with v. If the first part of dimension is (3,4), and v’s dimension is (4,2), the dimension of the dot product should be (3,2). Why the expected dimension is (3,4)?

q shape [3, 4]
k shape [4, 4]
v shape [4, 2]
None
matmul_qk shape [3, 4]
dk 4
tf.Tensor(
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817567]
[0.33620113 0.33620113 0.12368149 0.2039163 ]], shape=(3, 4), dtype=float32)

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
57 attention, weights = target(q, k, v, None)
58 assert tf.is_tensor(weights), “Weights must be a tensor”
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],

AssertionError: Wrong shape. We expected (3, 4)

TMosh · March 4, 2024, 7:54pm

Perhaps your output shape is incorrect. It should be (3,2) for the first test, and (1, 3,2) for the second test.

paulinpaloalto · March 5, 2024, 5:07am

That’s probably where your problems originate. That’s not what the math formula says. When Prof Ng writes QK^T that does not mean elementwise multiply (*), right? When Prof Ng omits the operator, it means dot product style matrix multiply.

Kejun_Zhang · March 5, 2024, 3:37pm

Thanks.
I realized that I had erroneously equated the attention_weight to output. After fixing that, the error was resolved.

Topic		Replies	Views
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4 , coursera-platform	6	68	September 6, 2024
C5 W4 Lab1 E3 Sequence Models week-4 , coursera-platform	8	20	March 25, 2025
Scaled_dot_product_attention q, k, and v dimensions not correct Sequence Models coursera-platform	4	451	July 21, 2023
W4, scaled_dot_product_attenion, Output must be a tensor Sequence Models coursera-platform	2	472	August 3, 2023
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3216	March 24, 2025

C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention

q shape [3, 4] k shape [4, 4] v shape [4, 2] None matmul_qk shape [3, 4] dk 4 tf.Tensor( [[0.2589478 0.42693272 0.15705977 0.15705977] [0.2772748 0.2772748 0.2772748 0.16817567] [0.33620113 0.33620113 0.12368149 0.2039163 ]], shape=(3, 4), dtype=float32)

Related topics

q shape [3, 4]
k shape [4, 4]
v shape [4, 2]
None
matmul_qk shape [3, 4]
dk 4
tf.Tensor(
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817567]
[0.33620113 0.33620113 0.12368149 0.2039163 ]], shape=(3, 4), dtype=float32)