I’m not sure the scaled_dot_product_attention function can be done correctly.

q, k and v must have matching leading dimensions and k and v must have matching penultimate dimensions. That’s the only way to complete this exercise. Well, they simply don’t.

q shape (3, 4)

k shape (4, 4)

v shape (4, 2)

At no point is there any indication that I am meant to manipulate the dimensions of q, k, or v in any way other than transposing them.

qk can only be shape (3, 4) or shape (4, 3)

Therefore qk times v can only be shape (3, 2) or shape (2, 3)

The required output shape is (3, 4)

On all previous functions, I passed. Where’s my mistake? I’m really not sure what I’m missing.

You’ve found an error in the doctext. None of that is true. I’ll submit a ticket to fix the issue.

If you look at the figure in the instructions, you need to compute Q times the transpose of K.

tf.matmul(q, tf.transpose(k)) Ends up being shape (3, 4) and v remains shape (4, 2).

Meaning the only output I can get is (3, 2) or the transpose of that. The output needs to be shape (3, 4). I’m still confused as to what mistake I am making.

Also, this is not true.

Please note that we do the `matmul`

between `attention_weights`

and `v`

, not the `matmul_qk`

and `v`

. We get `attention_weights`

after applying the softmax and then multiply the `v`

. Below is the output of my notebook:

```
q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (3, 4)
output shape: (3, 2)
q shape: (3, 4)
k shape: (4, 4)
v shape: (4, 2)
matmul_qk shape: (3, 4)
attention_weights shape: (1, 3, 4)
output shape: (1, 3, 2)
All tests passed
```