DLS course 5 w4 scaled dot product attention

hi, I am having issues with final exercise w4. I am getting the following error and the grader output is not given much details.

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
60 assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
61 [0.2772748, 0.2772748, 0.2772748, 0.16817567],
—> 62 [0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
63
64 assert tf.is_tensor(attention), “Output must be a tensor”

AssertionError:

I printing both output and attention weight variables and both are tf.tensors.

tf.Tensor(
[[0.7464066 0.23822893]
[0.7461551 0.23846523]
[0.7383507 0.24579675]], shape=(3, 2), dtype=float32)
tf.Tensor(
[[0.2535934 0.26994875 0.23822893 0.23822893]
[0.25384492 0.25384492 0.25384492 0.23846523]
[0.26164928 0.26164928 0.23090468 0.24579675]], shape=(3, 4), dtype=float32)

have you an idea what could be wrong?

this is my implementation:

matmul_qk = tf.linalg.matmul(q, k, transpose_b=True) # (…, seq_len_q, seq_len_k)

dk = k.shape[-2]
scaled_attention_logits = tf.divide(matmul_qk,dk**2)

if mask is not None: # Don't replace this None
    scaled_attention_logits += (1.0-mask)*-1e9 

attention_weights = tf.keras.activations.softmax(scaled_attention_logits) 
output = tf.linalg.matmul(attention_weights,v)

Can you post your implementation of scaled attention?

You have a few mistakes. Read the instructions very carefully again. For example, d_k is the embedding dimension of each head, i.e., the last dimension of the key (d_model / h). The scaling is done using square ROOT, not d_k squared. :slight_smile:

Note that the assert was from line 62, for having the wrong values. Not due to the values being tensors, that was from line 64.