I am getting a wrong shape error for my scaled_dot_product_attention function:
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],
AssertionError: Wrong shape. We expected (3, 4)
Since I can add the mask to scaled_attention_logits, I don’t think that is the wrong shape. I did transpose k for the first matrix multiplication. I did not transpose v in the second matmult. It doesn’t actually say where the problem is occurring in the function. Any ideas what I could be doing wrong?
Notice that it’s “throwing” about the shape of the weights
, not the output
. I added print statements to show all the relevant shapes in this function and here’s what I see when I run that test cell:
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
[2. 2. 2. 1.]
[2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
attention_weights.shape (3, 4)
attention_weights =
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817567]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]
sum(attention_weights(axis = -1)) =
[[1.0000001]
[1. ]
[1. ]]
output.shape (3, 2)
output =
[[0.74105227 0.15705977]
[0.7227253 0.16817567]
[0.6637989 0.2039163 ]]
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
[2. 2. 2. 1.]
[2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
mask.shape (1, 3, 4)
applying mask =
[[[1 1 0 1]
[1 1 0 1]
[1 1 0 1]]]
attention_weights.shape (1, 3, 4)
attention_weights =
[[[0.3071959 0.5064804 0. 0.18632373]
[0.38365173 0.38365173 0. 0.23269653]
[0.38365173 0.38365173 0. 0.23269653]]]
sum(attention_weights(axis = -1)) =
[[[1.]
[1.]
[1.]]]
output.shape (1, 3, 2)
output =
[[[0.6928041 0.18632373]
[0.61634827 0.23269653]
[0.61634827 0.23269653]]]
All tests passed
One way to debug this would be to add equivalent statements for at least the weights in your code and compare your results to what I show above. That should shed some light for where to look in more detail.
Thanks. I added in some print statements and the problem seems to happen when I add on the mask. And when I try to print the shape of the mask it says ‘NoneType’ object has no attribute ‘shape’. The mask is an input variable, though, so shouldn’t it have the correct shape when it’s passed in?
My attention weights have the same values as yours but the wrong shape.
attention_weights: tf.Tensor(
[[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817565]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817565]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]
[[0.3071959 0.5064804 0. 0.18632373]
[0.38365173 0.38365173 0. 0.23269653]
[0.38365173 0.38365173 0. 0.23269653]]], shape=(3, 3, 4), dtype=float32)
Yes, but note that the mask is not always included, right? There are two test cases: one w mask and one w/o.
In the w mask case, I get shape 1,3,4, but it looks like you get shape 3,3,4. So how could that happen? Sorry I am away from my computer for the next couple of hrs, so cannot look at the code to give a better hint ….
It seems like the padding mask adds an extra dimension and that is what is throwing things off. I tried using tf.squeeze to get rid of the extra dimension, but that didn’t work - it then said my weight values were incorrect. Which doesn’t really surprise me because it is obviously deliberately added. But I definitely only need to use the values from two out of the three dimensions.
Okay, I figured it out. I was adding the mask twice - once where I was supposed to and again inside the softmax (I was calling create_padding_mask and making a new mask inside the softmax instead of just using the mask provided to the function).
1 Like
It’s good news that you found the issue! Thanks for confirming.