Course 5 Week 4 Exercise 3

I’m getting the below error on my implementation of scaled_dot_product_attention.

I’ve done the ffg:

  • used np.matmul() to combine q and k
  • used k.shape[-1] and np.sqrt() to scale the above to get scaled_attention_logits
  • Applied this as the mask: (1.0 - mask) * -1e9
  • Used the ffg with axis=-1 to get the softmax: tf.keras.activations.softmax()
  • then used np.matmul() to combine attention_weights and v to get the output.

Please can someone assist?

AttributeError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
55 v = np.array([[0, 0], [1, 0], [1, 0], [1, 1]]).astype(np.float32)
56
—> 57 attention, weights = target(q, k, v, None)
58 assert tf.is_tensor(weights), “Weights must be a tensor”
59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"

in scaled_dot_product_attention(q, k, v, mask)
33 # softmax is normalized on the last axis (seq_len_k) so that the scores
34 # add up to 1.
—> 35 attention_weights = tf.keras.activations.softmax(scaled_attention_logits, axis=-1)
36 # (…, seq_len_q, seq_len_k)
37

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 “”“Call target, and fall back on dispatchers if there is a TypeError.”“”
200 try:
→ 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/activations.py in softmax(x, axis)
72 ValueError: In case dim(x) == 1.
73 “”"
—> 74 rank = x.shape.rank
75 if rank == 2:
76 output = nn.softmax(x)

AttributeError: ‘tuple’ object has no attribute ‘rank’

I’m going to guess that there is a problem with your code that computes scaled_attention_logits.

Also, you don’t really need axis = -1 when you call the softmax activation.

2 Likes

Right, the first thing to check is the type of your scaled_attention_logits. Looks like it might be a python tuple. I added a bunch of print statements to show intermediate state in that function and here’s what I see:

q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
attention_weights.shape (3, 4)
attention_weights =
[[0.2589478  0.42693272 0.15705977 0.15705977]
 [0.2772748  0.2772748  0.2772748  0.16817567]
 [0.33620113 0.33620113 0.12368149 0.2039163 ]]
sum(attention_weights(axis = -1)) =
[[1.0000001]
 [1.       ]
 [1.       ]]
output.shape (3, 2)
output =
[[0.74105227 0.15705977]
 [0.7227253  0.16817567]
 [0.6637989  0.2039163 ]]
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
mask.shape (1, 3, 4)
applying mask =
[[[1 1 0 1]
  [1 1 0 1]
  [1 1 0 1]]]
attention_weights.shape (1, 3, 4)
attention_weights =
[[[0.3071959  0.5064804  0.         0.18632373]
  [0.38365173 0.38365173 0.         0.23269653]
  [0.38365173 0.38365173 0.         0.23269653]]]
sum(attention_weights(axis = -1)) =
[[[1.]
  [1.]
  [1.]]]
output.shape (1, 3, 2)
output =
[[[0.6928041  0.18632373]
  [0.61634827 0.23269653]
  [0.61634827 0.23269653]]]
All tests passed

Note that scaled_attention_logits should be an EagerTensor. So how could that go sideways?

2 Likes

Thanks.

The values in my matmul_qk are different to yours. This is what I’m getting:
[[1. 2. 1. 2.]
[1. 1. 2. 2.]
[1. 1. 0. 2.]]

I don’t understand this. It’s simply done using np.matmul() on q and k. So why is it different? Should I not be using numpy here?

Secondly, you are correct - the type of my scaled_attention_logits is also different to yours. Mine is <class 'numpy.ndarray'>. However the shape does agree with your shape. This array is just matmul_qk divided by a scalar (the root of dk) so I don’t understand why the type is different to yours.

Please assist.

No, you are supposed to use tensorflow package here and its functions. Numpy and tensorflow are different libraries. Assuming all else is correct, and also no need for activation for softmax as Tom says.

1 Like

Yes, the answers may well come out the same if you use numpy, but you have to be very careful about mixing numpy and TF: the reason is that TF generates gradients automatically and every node in the compute graph needs to be a TF function or that breaks the graph and you don’t get gradients. So you can use numpy only for things that are constants basically and not part of the training.

Please take a closer look at the formula for matmul_qk: it’s not just a matrix multiply, right? You have to transpose k as well. It’s square, as we see from the shapes, so the multiply still works from a dimensionality p.o.v., but you get a different answer if you don’t do the transpose. I tried making that mistake and I get exactly the wrong answer that you show. The details always matter here. :nerd_face:

2 Likes

Thanks so much.