C5 W4 A1 Ex-3 Bad attention_weights values

Adam_Moses · October 18, 2022, 5:09pm

I am having trouble with the values for attention_weights that my function is returning, which fails the unit test because the values are not the same.

Below is my code, with all operations commented out so as to observe the Honor Code, plus my output and the unit test error.

Clearly my values aren’t matching up, but I don’t see why, as in the first unit case the mask isn’t even used.

CODE:

   # START CODE HERE

    print("k =\n", k)
    print("q =\n", q)
    print("v =\n", v)
    
    matmul_qk = # use tensorflow matrix multiplier on q and k

    print("matmul_qk =\n", matmul_qk)
    
    # scale matmul_qk
    dk = # get the seq_len_k value from the shape of k
    scaled_attention_logits = # divide the matmul_qk value by the numpy sqrt of dk
    print("dk =\n", dk)
    print("scaled_attention_logits =\n", scaled_attention_logits)

    print("mask =\n", mask)
    
    # add the mask to the scaled tensor.
    if mask is not None: # Don't replace this None
        scaled_attention_logits += # add the one minus mask times -1e9 as described

    # softmax is normalized on the last axis (seq_len_k) so that the scores
    # add up to 1.
    attention_weights = # use the tensorflow softmax on the logits
    print("attention_weights =\n", attention_weights)
    
    output = # again use the tf matrix multipler on the logits and v 
    print("output =\n", output)
    
    # END CODE HERE

OUTPUT:

k =
 [[1. 1. 0. 1.]
 [1. 0. 1. 1.]
 [0. 1. 1. 0.]
 [0. 0. 0. 1.]]
q =
 [[1. 0. 1. 1.]
 [0. 1. 1. 1.]
 [1. 0. 0. 1.]]
v =
 [[0. 0.]
 [1. 0.]
 [1. 0.]
 [1. 1.]]
matmul_qk =
 tf.Tensor(
[[1. 2. 1. 2.]
 [1. 1. 2. 2.]
 [1. 1. 0. 2.]], shape=(3, 4), dtype=float32)
dk =
 4
scaled_attention_logits =
 tf.Tensor(
[[0.5 1.  0.5 1. ]
 [0.5 0.5 1.  1. ]
 [0.5 0.5 0.  1. ]], shape=(3, 4), dtype=float32)
mask =
 None
attention_weights =
 tf.Tensor(
[[0.18877034 0.31122968 0.18877034 0.31122968]
 [0.18877034 0.18877034 0.31122968 0.31122968]
 [0.23500371 0.23500371 0.14253695 0.3874556 ]], shape=(3, 4), dtype=float32)
output =
 tf.Tensor(
[[2.5 1. ]
 [2.5 1. ]
 [1.5 1. ]], shape=(3, 2), dtype=float32)

ERROR:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-40-00665b20febb> in <module>
      1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
     60     assert np.allclose(weights, [[0.2589478,  0.42693272, 0.15705977, 0.15705977],
     61                                    [0.2772748,  0.2772748,  0.2772748,  0.16817567],
---> 62                                    [0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
     63 
     64     assert tf.is_tensor(attention), "Output must be a tensor"

AssertionError:

Juan_Olano · October 18, 2022, 5:19pm

Hi,

Are you calculating matmul_qk with the transformed of k?

Adam_Moses · October 18, 2022, 5:27pm

I guess not, I didn’t even notice a requirement for transformation.

Is it just transposed?

Juan_Olano · October 18, 2022, 5:29pm

Yes. The other item you would like to check is: how are you calculating ‘dk’? according to your output, it is giving you an integer, but you may want a tensor. Check that out as well.

Adam_Moses · October 18, 2022, 5:29pm

Wow, many many thanks for your help Juan, it’d been so long so we used the T exponent for Transpose I didn’t even notice it until you pointed it out.

Everything’s working!

Juan_Olano · October 18, 2022, 5:30pm

Great! Thank you! If there’s any issue down the road, we will be happy to assist.

Topic		Replies	Views
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models coursera-platform	27	3223	March 24, 2025
C5 W4 A1 E3AssertionError: Wrong unmasked weights Sequence Models week-module-4 , coursera-platform	5	368	February 27, 2024
C4W2_Assignment in Natural Language Processing with Attention NLP with Attention Models week-module-3	2	67	September 2, 2024
C5 W4 A1: Wrong masked weights: scaled_dot_product_attention() Sequence Models coursera-platform	4	727	February 6, 2022
Week 4: scaled_dot_product_attention Sequence Models coursera-platform	3	905	August 5, 2021

C5 W4 A1 Ex-3 Bad attention_weights values

Related topics