C5 W4 A1 Ex-3 Bad attention_weights values

I am having trouble with the values for attention_weights that my function is returning, which fails the unit test because the values are not the same.

Below is my code, with all operations commented out so as to observe the Honor Code, plus my output and the unit test error.

Clearly my values aren’t matching up, but I don’t see why, as in the first unit case the mask isn’t even used.

CODE:

   # START CODE HERE

    print("k =\n", k)
    print("q =\n", q)
    print("v =\n", v)
    
    matmul_qk = # use tensorflow matrix multiplier on q and k

    print("matmul_qk =\n", matmul_qk)
    
    # scale matmul_qk
    dk = # get the seq_len_k value from the shape of k
    scaled_attention_logits = # divide the matmul_qk value by the numpy sqrt of dk
    print("dk =\n", dk)
    print("scaled_attention_logits =\n", scaled_attention_logits)

    print("mask =\n", mask)
    
    # add the mask to the scaled tensor.
    if mask is not None: # Don't replace this None
        scaled_attention_logits += # add the one minus mask times -1e9 as described

    # softmax is normalized on the last axis (seq_len_k) so that the scores
    # add up to 1.
    attention_weights = # use the tensorflow softmax on the logits
    print("attention_weights =\n", attention_weights)
    
    output = # again use the tf matrix multipler on the logits and v 
    print("output =\n", output)
    
    # END CODE HERE

OUTPUT:

k =
 [[1. 1. 0. 1.]
 [1. 0. 1. 1.]
 [0. 1. 1. 0.]
 [0. 0. 0. 1.]]
q =
 [[1. 0. 1. 1.]
 [0. 1. 1. 1.]
 [1. 0. 0. 1.]]
v =
 [[0. 0.]
 [1. 0.]
 [1. 0.]
 [1. 1.]]
matmul_qk =
 tf.Tensor(
[[1. 2. 1. 2.]
 [1. 1. 2. 2.]
 [1. 1. 0. 2.]], shape=(3, 4), dtype=float32)
dk =
 4
scaled_attention_logits =
 tf.Tensor(
[[0.5 1.  0.5 1. ]
 [0.5 0.5 1.  1. ]
 [0.5 0.5 0.  1. ]], shape=(3, 4), dtype=float32)
mask =
 None
attention_weights =
 tf.Tensor(
[[0.18877034 0.31122968 0.18877034 0.31122968]
 [0.18877034 0.18877034 0.31122968 0.31122968]
 [0.23500371 0.23500371 0.14253695 0.3874556 ]], shape=(3, 4), dtype=float32)
output =
 tf.Tensor(
[[2.5 1. ]
 [2.5 1. ]
 [1.5 1. ]], shape=(3, 2), dtype=float32)

ERROR:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-40-00665b20febb> in <module>
      1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
     60     assert np.allclose(weights, [[0.2589478,  0.42693272, 0.15705977, 0.15705977],
     61                                    [0.2772748,  0.2772748,  0.2772748,  0.16817567],
---> 62                                    [0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
     63 
     64     assert tf.is_tensor(attention), "Output must be a tensor"

AssertionError: 
2 Likes

Hi,

Are you calculating matmul_qk with the transformed of k?

I guess not, I didn’t even notice a requirement for transformation.

Is it just transposed?

Yes. The other item you would like to check is: how are you calculating ‘dk’? according to your output, it is giving you an integer, but you may want a tensor. Check that out as well.

1 Like

Wow, many many thanks for your help Juan, it’d been so long so we used the T exponent for Transpose I didn’t even notice it until you pointed it out.

Everything’s working!

Great! Thank you! If there’s any issue down the road, we will be happy to assist.