C5 W4 Lab1 E3

ejwalter · March 24, 2025, 3:30pm

I am getting a wrong shape error for my scaled_dot_product_attention function:
—> 59 assert tuple(tf.shape(weights).numpy()) == (q.shape[0], k.shape[1]), f"Wrong shape. We expected ({q.shape[0]}, {k.shape[1]})"
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],

AssertionError: Wrong shape. We expected (3, 4)

Since I can add the mask to scaled_attention_logits, I don’t think that is the wrong shape. I did transpose k for the first matrix multiplication. I did not transpose v in the second matmult. It doesn’t actually say where the problem is occurring in the function. Any ideas what I could be doing wrong?

paulinpaloalto · March 24, 2025, 4:10pm

Notice that it’s “throwing” about the shape of the weights, not the output. I added print statements to show all the relevant shapes in this function and here’s what I see when I run that test cell:

q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
attention_weights.shape (3, 4)
attention_weights =
[[0.2589478  0.42693272 0.15705977 0.15705977]
 [0.2772748  0.2772748  0.2772748  0.16817567]
 [0.33620113 0.33620113 0.12368149 0.2039163 ]]
sum(attention_weights(axis = -1)) =
[[1.0000001]
 [1.       ]
 [1.       ]]
output.shape (3, 2)
output =
[[0.74105227 0.15705977]
 [0.7227253  0.16817567]
 [0.6637989  0.2039163 ]]
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
mask.shape (1, 3, 4)
applying mask =
[[[1 1 0 1]
  [1 1 0 1]
  [1 1 0 1]]]
attention_weights.shape (1, 3, 4)
attention_weights =
[[[0.3071959  0.5064804  0.         0.18632373]
  [0.38365173 0.38365173 0.         0.23269653]
  [0.38365173 0.38365173 0.         0.23269653]]]
sum(attention_weights(axis = -1)) =
[[[1.]
  [1.]
  [1.]]]
output.shape (1, 3, 2)
output =
[[[0.6928041  0.18632373]
  [0.61634827 0.23269653]
  [0.61634827 0.23269653]]]
All tests passed

One way to debug this would be to add equivalent statements for at least the weights in your code and compare your results to what I show above. That should shed some light for where to look in more detail.

ejwalter · March 24, 2025, 5:20pm

Thanks. I added in some print statements and the problem seems to happen when I add on the mask. And when I try to print the shape of the mask it says ‘NoneType’ object has no attribute ‘shape’. The mask is an input variable, though, so shouldn’t it have the correct shape when it’s passed in?

ejwalter · March 24, 2025, 5:49pm

My attention weights have the same values as yours but the wrong shape.
attention_weights: tf.Tensor(
[[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817565]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]

[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817565]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]

[[0.3071959 0.5064804 0. 0.18632373]
[0.38365173 0.38365173 0. 0.23269653]
[0.38365173 0.38365173 0. 0.23269653]]], shape=(3, 3, 4), dtype=float32)

paulinpaloalto · March 24, 2025, 7:12pm

Yes, but note that the mask is not always included, right? There are two test cases: one w mask and one w/o.

paulinpaloalto · March 24, 2025, 7:14pm

In the w mask case, I get shape 1,3,4, but it looks like you get shape 3,3,4. So how could that happen? Sorry I am away from my computer for the next couple of hrs, so cannot look at the code to give a better hint ….

ejwalter · March 24, 2025, 8:58pm

It seems like the padding mask adds an extra dimension and that is what is throwing things off. I tried using tf.squeeze to get rid of the extra dimension, but that didn’t work - it then said my weight values were incorrect. Which doesn’t really surprise me because it is obviously deliberately added. But I definitely only need to use the values from two out of the three dimensions.

ejwalter · March 24, 2025, 9:08pm

Okay, I figured it out. I was adding the mask twice - once where I was supposed to and again inside the softmax (I was calling create_padding_mask and making a new mask inside the softmax instead of just using the mask provided to the function).

paulinpaloalto · March 25, 2025, 12:47am

It’s good news that you found the issue! Thanks for confirming.

Topic		Replies	Views
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4 , coursera-platform	6	68	September 6, 2024
C5W4 Transformer Network Exercise 3 - scaled_dot_product_attention Sequence Models week-4 , coursera-platform	3	337	March 5, 2024
Week 4: scaled_dot_product_attention Sequence Models coursera-platform	3	904	August 5, 2021
C5 W4 A1: Wrong masked weights: scaled_dot_product_attention() Sequence Models coursera-platform	4	726	February 6, 2022
C5 W4 A1 E3AssertionError: Wrong unmasked weights Sequence Models week-4 , coursera-platform	5	361	February 27, 2024

C5 W4 Lab1 E3

Related topics