I am having trouble with the values for attention_weights that my function is returning, which fails the unit test because the values are not the same.
Below is my code, with all operations commented out so as to observe the Honor Code, plus my output and the unit test error.
Clearly my values aren’t matching up, but I don’t see why, as in the first unit case the mask isn’t even used.
CODE:
# START CODE HERE
print("k =\n", k)
print("q =\n", q)
print("v =\n", v)
matmul_qk = # use tensorflow matrix multiplier on q and k
print("matmul_qk =\n", matmul_qk)
# scale matmul_qk
dk = # get the seq_len_k value from the shape of k
scaled_attention_logits = # divide the matmul_qk value by the numpy sqrt of dk
print("dk =\n", dk)
print("scaled_attention_logits =\n", scaled_attention_logits)
print("mask =\n", mask)
# add the mask to the scaled tensor.
if mask is not None: # Don't replace this None
scaled_attention_logits += # add the one minus mask times -1e9 as described
# softmax is normalized on the last axis (seq_len_k) so that the scores
# add up to 1.
attention_weights = # use the tensorflow softmax on the logits
print("attention_weights =\n", attention_weights)
output = # again use the tf matrix multipler on the logits and v
print("output =\n", output)
# END CODE HERE
OUTPUT:
k =
[[1. 1. 0. 1.]
[1. 0. 1. 1.]
[0. 1. 1. 0.]
[0. 0. 0. 1.]]
q =
[[1. 0. 1. 1.]
[0. 1. 1. 1.]
[1. 0. 0. 1.]]
v =
[[0. 0.]
[1. 0.]
[1. 0.]
[1. 1.]]
matmul_qk =
tf.Tensor(
[[1. 2. 1. 2.]
[1. 1. 2. 2.]
[1. 1. 0. 2.]], shape=(3, 4), dtype=float32)
dk =
4
scaled_attention_logits =
tf.Tensor(
[[0.5 1. 0.5 1. ]
[0.5 0.5 1. 1. ]
[0.5 0.5 0. 1. ]], shape=(3, 4), dtype=float32)
mask =
None
attention_weights =
tf.Tensor(
[[0.18877034 0.31122968 0.18877034 0.31122968]
[0.18877034 0.18877034 0.31122968 0.31122968]
[0.23500371 0.23500371 0.14253695 0.3874556 ]], shape=(3, 4), dtype=float32)
output =
tf.Tensor(
[[2.5 1. ]
[2.5 1. ]
[1.5 1. ]], shape=(3, 2), dtype=float32)
ERROR:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-40-00665b20febb> in <module>
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)
~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],
---> 62 [0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
63
64 assert tf.is_tensor(attention), "Output must be a tensor"
AssertionError: