I am trying to build the function scaled_dot_product_attention(q, k, v, mask)
-done matmul between q and k transverse
-determined the size of dk using dk.size
-determined scaled_attention_logiits dividing matmul by sqrt of dk.size
-added the mask*1e-9 with scaled_attention_logits
-determined attention_weights = tf.keras.activations.softmax(scaled_attention_logits, axis = -1)
-done matmul between attention_weights and v
I spent quite a lot of time on this generating random arrays and the results were absolutely fine. But the unit test is showing the error: assert tf.is_tensor(attention), “Output must be a tensor”
What could possible go wrong here? I checked the attention_weights type. It showed it to be a tennsor. Really frustrated.
assert np.allclose(weights, [[0.2589478, 0.42693272, 0.15705977, 0.15705977],
[0.2772748, 0.2772748, 0.2772748, 0.16817567],
[0.33620113, 0.33620113, 0.12368149, 0.2039163 ]])
in the unit test, the assert statement above didn’t include any error message, so it could be misleading as where the actual error happens. when you see the error “Output must be a tensor”, it could because you didn’t pass the previous “np.allclose” assert statement, instead of the “tf.is_tensor” statement.
possibly your attention weights are incorrect.
Dude, I have been stuck at this part for the last 4 hours. If someone can please save my laptop from being thrown out of the window, then help. This is what I am doing -
(Solution code removed by staff as sharing it publicly is against the Code of Honour)
Answer: Choose the second dimension of the k array and get the size
You are not getting the Output is not a tensor error. If you print each results, most of them are tensors. It is just your error is the line before the “output is not a tensor”. The error is the wrong values of the weights.
Yes, that’s correct. In the assignment, dk is the dimensionality of the query and key vectors. We calculate attention weights later by dividing the dot product between key and query by the square root of dk.
I think when I added transpose_b=True in the matmul_qk calculation instead of explicitly inverting the k values helped me. Not 100% sure because I did add a few more small changes. but thanks a lot mentor!