C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention

I got this error message as below. I also print out the dimension of q, k , v. The equation for attention matrics is softmax(q*K.T) and dot product with v. If the first part of dimension is (3,4), and v’s dimension is (4,2), the dimension of the dot product should be (3,2). Why the expected dimension is (3,4)?

q shape [3, 4]
k shape [4, 4]
v shape [4, 2]
None
matmul_qk shape [3, 4]
dk 4
tf.Tensor(
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817567]
[0.33620113 0.33620113 0.12368149 0.2039163 ]], shape=(3, 4), dtype=float32)

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
57 attention, weights = target(q, k, v, None)
58 assert tf.is_tensor(weights), “Weights must be a tensor”
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],

AssertionError: Wrong shape. We expected (3, 4)

Perhaps your output shape is incorrect. It should be (3,2) for the first test, and (1, 3,2) for the second test.

That’s probably where your problems originate. That’s not what the math formula says. When Prof Ng writes QK^T that does not mean elementwise multiply (*), right? When Prof Ng omits the operator, it means dot product style matrix multiply.

1 Like

Thanks.
I realized that I had erroneously equated the attention_weight to output. After fixing that, the error was resolved.

2 Likes