W4 A1 | Ex-3 | Scaled Dot Product Attention

I am trying to build the function scaled_dot_product_attention(q, k, v, mask)

-done matmul between q and k transverse
-determined the size of dk using dk.size
-determined scaled_attention_logiits dividing matmul by sqrt of dk.size
-added the mask*1e-9 with scaled_attention_logits
-determined attention_weights = tf.keras.activations.softmax(scaled_attention_logits, axis = -1)
-done matmul between attention_weights and v

I spent quite a lot of time on this generating random arrays and the results were absolutely fine. But the unit test is showing the error: assert tf.is_tensor(attention), “Output must be a tensor”

What could possible go wrong here? I checked the attention_weights type. It showed it to be a tennsor. Really frustrated.

did you use the tf functions?
Not sure about the size of dk

@Andreas , I did and was able to troubleshoot the problem afterward. Thanks!

I have the same problem. So how do you solve it?

assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
[0.2772748, 0.2772748, 0.2772748, 0.16817567],
[0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
in the unit test, the assert statement above didn’t include any error message, so it could be misleading as where the actual error happens. when you see the error “Output must be a tensor”, it could because you didn’t pass the previous “np.allclose” assert statement, instead of the “tf.is_tensor” statement.
possibly your attention weights are incorrect.

1 Like

Make sure the size of dk is seq_len_k


Dude, I have been stuck at this part for the last 4 hours. If someone can please save my laptop from being thrown out of the window, then help. This is what I am doing -

(Solution code removed by staff as sharing it publicly is against the Code of Honour)

Please let me know where I am messing

On step 4 you have mask * 1e-9 and is mask * -1e9.
I had the same problem, it was very difficult to spot the minus out of place.


Does anyone still have a problem with implementing the attention function?


Answer: Choose the second dimension of the k array and get the size :smiley:
You are not getting the Output is not a tensor error. If you print each results, most of them are tensors. It is just your error is the line before the “output is not a tensor”. The error is the wrong values of the weights. :smiley:


Hi moody, this is where I got confused. Shouldn’t dk be a scalar?

1 Like

Yes, I was cunfused by the quantity dk, it should be the last dimision of q and k, am I wrong?

Yes, that’s correct. In the assignment, dk is the dimensionality of the query and key vectors. We calculate attention weights later by dividing the dot product between key and query by the square root of dk.

Thank you! now I got it right

1 Like

I had a mistake where I was multiplying by -1e9 in the wrong place, so consider where that should go to adjust the mask.

Could anyone let me know what’s the mistake,Even I performed the same steps but got an error like:

AssertionError Traceback (most recent call last)
30 print("\033[92mAll tests passed")
—> 32 scaled_dot_product_attention_test(scaled_dot_product_attention)

in scaled_dot_product_attention_test(target)
23 assert np.allclose(weights, [[0.30719590187072754, 0.5064803957939148, 0.0, 0.18632373213768005],
24 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862],
—> 25 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862]]), “Wrong masked weights”
26 assert np.allclose(attention, [[0.6928040981292725, 0.18632373213768005],
27 [0.6163482666015625, 0.2326965481042862],

AssertionError: Wrong masked weights


+1 Although i did edit that assert statement myself with that message while debugging

How did you solve it please tell

@isuru @MuskaanManocha @ee18btech11012

Guys, there’re 3 typically mistake in this section, make sure you get them right:

  • dk is the dimension of keys, which should be seq_len_k.
  • apply -1e9 to the mask before adding up to scaled tensor.
  • use tf.matmul to compute output

I think when I added transpose_b=True in the matmul_qk calculation instead of explicitly inverting the k values helped me. Not 100% sure because I did add a few more small changes. but thanks a lot mentor!