C4W2_Assignment in Natural Language Processing with Attention

Hi, so for the scaled_dot_product_attention I keep failing the unit tests and the section test as well for context here is my logic :
first I multiply the dot product of q k.Transposed to produce matmul_qk,
then scaled matmul_qk with the square root of dk such that dk is cast as tf.shape(k)[-1] dtype=tf.float3,
and then scaled the attention_logits to matmul_qk/tf.math.sqrt(dk)
I am not sure if any of the above steps are wrong or not, but the following I assume I have implemented correctly where the mask provided is first subtracted by 1 as in (1. - mask ) and then multiplied to 1e9 if the mask exists, after which we add it to the scaled attention weights,
I then fed it to the softmax activation and then multiplied the product with v and returned it

for more info here is my output :

@HASHEM_JABER I was getting the same results as you too at one point… Trying to remember what I did to fix it. Keep in mind you want to use matmul and not the dot product in this case.

Same with your output.

Also keep in mind it is -1e9 not 1e9.

3 Likes

@Nevermnd Thank you soooo much, the issue was indeed 1e9 being in the positive sign not the negative, thanks for the heads up sir !!! :heart:

2 Likes