W4 A1 | Ex-3 | Scaled Dot Product Attention

Zarif · May 14, 2021, 8:19am

I am trying to build the function scaled_dot_product_attention(q, k, v, mask)

-done matmul between q and k transverse
-determined the size of dk using dk.size
-determined scaled_attention_logiits dividing matmul by sqrt of dk.size
-added the mask*1e-9 with scaled_attention_logits
-determined attention_weights = tf.keras.activations.softmax(scaled_attention_logits, axis = -1)
-done matmul between attention_weights and v

I spent quite a lot of time on this generating random arrays and the results were absolutely fine. But the unit test is showing the error: assert tf.is_tensor(attention), “Output must be a tensor”

What could possible go wrong here? I checked the attention_weights type. It showed it to be a tennsor. Really frustrated.

Andreas · May 14, 2021, 10:43am

did you use the tf functions?
Not sure about the size of dk

Zarif · May 14, 2021, 12:48pm

@Andreas , I did and was able to troubleshoot the problem afterward. Thanks!

xiaonanliu · May 14, 2021, 9:23pm

I have the same problem. So how do you solve it?

meiyou · May 15, 2021, 4:21am

assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
[0.2772748, 0.2772748, 0.2772748, 0.16817567],
[0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
in the unit test, the assert statement above didn’t include any error message, so it could be misleading as where the actual error happens. when you see the error “Output must be a tensor”, it could because you didn’t pass the previous “np.allclose” assert statement, instead of the “tf.is_tensor” statement.
possibly your attention weights are incorrect.

moody · May 25, 2021, 6:22am

Make sure the size of dk is seq_len_k

ParthM · May 26, 2021, 3:17am

Dude, I have been stuck at this part for the last 4 hours. If someone can please save my laptop from being thrown out of the window, then help. This is what I am doing -

(Solution code removed by staff as sharing it publicly is against the Code of Honour)

Please let me know where I am messing

Magelvis · May 26, 2021, 9:40pm

On step 4 you have mask * 1e-9 and is mask * -1e9.
I had the same problem, it was very difficult to spot the minus out of place.

manifest · May 27, 2021, 3:35pm

Does anyone still have a problem with implementing the attention function?

phy6geniux · June 1, 2021, 12:17pm

Answer: Choose the second dimension of the k array and get the size
You are not getting the Output is not a tensor error. If you print each results, most of them are tensors. It is just your error is the line before the “output is not a tensor”. The error is the wrong values of the weights.

Wenyu · June 5, 2021, 1:37pm

Hi moody, this is where I got confused. Shouldn’t dk be a scalar?

Wenyu · June 5, 2021, 1:57pm

Yes, I was cunfused by the quantity dk, it should be the last dimision of q and k, am I wrong?

manifest · June 5, 2021, 3:41pm

Yes, that’s correct. In the assignment, dk is the dimensionality of the query and key vectors. We calculate attention weights later by dividing the dot product between key and query by the square root of dk.

Wenyu · June 6, 2021, 8:35am

Thank you! now I got it right

Anomy · June 9, 2021, 8:32pm

I had a mistake where I was multiplying by -1e9 in the wrong place, so consider where that should go to adjust the mask.

ee18btech11012 · June 10, 2021, 7:56am

Could anyone let me know what’s the mistake,Even I performed the same steps but got an error like:

AssertionError Traceback (most recent call last)
in
30 print("\033[92mAll tests passed")
31
—> 32 scaled_dot_product_attention_test(scaled_dot_product_attention)

in scaled_dot_product_attention_test(target)
23 assert np.allclose(weights, [[0.30719590187072754, 0.5064803957939148, 0.0, 0.18632373213768005],
24 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862],
—> 25 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862]]), “Wrong masked weights”
26 assert np.allclose(attention, [[0.6928040981292725, 0.18632373213768005],
27 [0.6163482666015625, 0.2326965481042862],

AssertionError: Wrong masked weights

MuskaanManocha · June 12, 2021, 1:11pm

+1 Although i did edit that assert statement myself with that message while debugging

isuru · June 17, 2021, 1:28pm

How did you solve it please tell

Damon · June 20, 2021, 2:24am

@isuru @MuskaanManocha @ee18btech11012

Guys, there’re 3 typically mistake in this section, make sure you get them right:

dk is the dimension of keys, which should be seq_len_k.
apply -1e9 to the mask before adding up to scaled tensor.
use tf.matmul to compute output

MuskaanManocha · June 20, 2021, 2:21pm

I think when I added transpose_b=True in the matmul_qk calculation instead of explicitly inverting the k values helped me. Not 100% sure because I did add a few more small changes. but thanks a lot mentor!

Topic		Replies	Views
C5_W4_A1 scaled_dot_product_attention mask issue Sequence Models	3	1013	November 15, 2022
Exercise 3 - scaled_dot_product_attention AssertionError Sequence Models	2	1017	November 3, 2021
C5 W4 A1: Wrong masked weights: scaled_dot_product_attention() Sequence Models	4	723	February 6, 2022
Scales dot product attention Sequence Models	2	952	June 18, 2021
C5_W4_A1_Transformer_Subclass_v1 Unit 3 scaled_dot_product_attention Error Sequence Models	1	874	August 10, 2021

W4 A1 | Ex-3 | Scaled Dot Product Attention

Related topics