Course 5 Week 4 Exercise 3

zaine_111 · March 3, 2025, 8:49pm

I’m getting the below error on my implementation of scaled_dot_product_attention.

I’ve done the ffg:

used np.matmul() to combine q and k
used k.shape[-1] and np.sqrt() to scale the above to get scaled_attention_logits
Applied this as the mask: (1.0 - mask) * -1e9
Used the ffg with axis=-1 to get the softmax: tf.keras.activations.softmax()
then used np.matmul() to combine attention_weights and v to get the output.

Please can someone assist?

AttributeError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
55 v = np.array([[0, 0], [1, 0], [1, 0], [1, 1]]).astype(np.float32)
56
—> 57 attention, weights = target(q, k, v, None)
58 assert tf.is_tensor(weights), “Weights must be a tensor”
59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"

in scaled_dot_product_attention(q, k, v, mask)
33 # softmax is normalized on the last axis (seq_len_k) so that the scores
34 # add up to 1.
—> 35 attention_weights = tf.keras.activations.softmax(scaled_attention_logits, axis=-1)
36 # (…, seq_len_q, seq_len_k)
37

/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
199 “”“Call target, and fall back on dispatchers if there is a TypeError.”“”
200 try:
→ 201 return target(*args, **kwargs)
202 except (TypeError, ValueError):
203 # Note: convert_to_eager_tensor currently raises a ValueError, not a

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/activations.py in softmax(x, axis)
72 ValueError: In case dim(x) == 1.
73 “”"
—> 74 rank = x.shape.rank
75 if rank == 2:
76 output = nn.softmax(x)

AttributeError: ‘tuple’ object has no attribute ‘rank’

TMosh · March 3, 2025, 8:52pm

I’m going to guess that there is a problem with your code that computes scaled_attention_logits.

Also, you don’t really need axis = -1 when you call the softmax activation.

paulinpaloalto · March 3, 2025, 9:53pm

Right, the first thing to check is the type of your scaled_attention_logits. Looks like it might be a python tuple. I added a bunch of print statements to show intermediate state in that function and here’s what I see:

q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
attention_weights.shape (3, 4)
attention_weights =
[[0.2589478  0.42693272 0.15705977 0.15705977]
 [0.2772748  0.2772748  0.2772748  0.16817567]
 [0.33620113 0.33620113 0.12368149 0.2039163 ]]
sum(attention_weights(axis = -1)) =
[[1.0000001]
 [1.       ]
 [1.       ]]
output.shape (3, 2)
output =
[[0.74105227 0.15705977]
 [0.7227253  0.16817567]
 [0.6637989  0.2039163 ]]
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
mask.shape (1, 3, 4)
applying mask =
[[[1 1 0 1]
  [1 1 0 1]
  [1 1 0 1]]]
attention_weights.shape (1, 3, 4)
attention_weights =
[[[0.3071959  0.5064804  0.         0.18632373]
  [0.38365173 0.38365173 0.         0.23269653]
  [0.38365173 0.38365173 0.         0.23269653]]]
sum(attention_weights(axis = -1)) =
[[[1.]
  [1.]
  [1.]]]
output.shape (1, 3, 2)
output =
[[[0.6928041  0.18632373]
  [0.61634827 0.23269653]
  [0.61634827 0.23269653]]]
All tests passed

Note that scaled_attention_logits should be an EagerTensor. So how could that go sideways?

zaine_111 · March 4, 2025, 10:32am

Thanks.

The values in my matmul_qk are different to yours. This is what I’m getting:
[[1. 2. 1. 2.]
[1. 1. 2. 2.]
[1. 1. 0. 2.]]

I don’t understand this. It’s simply done using np.matmul() on q and k. So why is it different? Should I not be using numpy here?

Secondly, you are correct - the type of my scaled_attention_logits is also different to yours. Mine is <class 'numpy.ndarray'>. However the shape does agree with your shape. This array is just matmul_qk divided by a scalar (the root of dk) so I don’t understand why the type is different to yours.

Please assist.

gent.spah · March 4, 2025, 12:24pm

No, you are supposed to use tensorflow package here and its functions. Numpy and tensorflow are different libraries. Assuming all else is correct, and also no need for activation for softmax as Tom says.

paulinpaloalto · March 4, 2025, 3:44pm

Yes, the answers may well come out the same if you use numpy, but you have to be very careful about mixing numpy and TF: the reason is that TF generates gradients automatically and every node in the compute graph needs to be a TF function or that breaks the graph and you don’t get gradients. So you can use numpy only for things that are constants basically and not part of the training.

Please take a closer look at the formula for matmul_qk: it’s not just a matrix multiply, right? You have to transpose k as well. It’s square, as we see from the shapes, so the multiply still works from a dimensionality p.o.v., but you get a different answer if you don’t do the transpose. I tried making that mistake and I get exactly the wrong answer that you show. The details always matter here.

zaine_111 · March 4, 2025, 5:49pm

Thanks so much.

Topic		Replies	Views
W4 Assignment 1 scaled_dot_product_attention error Sequence Models week-4 , coursera-platform	2	45	September 9, 2024
C5_W4_A1-UNQ_C3 scaled_dot_product_attention() Sequence Models coursera-platform	1	1636	July 25, 2021
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3217	March 24, 2025
C5 W4 A1: Wrong masked weights: scaled_dot_product_attention() Sequence Models coursera-platform	4	726	February 6, 2022
Exercise 3 - scaled_dot_product_attention AssertionError Sequence Models coursera-platform	2	1019	November 3, 2021

Course 5 Week 4 Exercise 3

Related topics