C4W2_Assignment in Natural Language Processing with Attention

HASHEM_JABER · September 2, 2024, 10:32pm

Hi, so for the scaled_dot_product_attention I keep failing the unit tests and the section test as well for context here is my logic :
first I multiply the dot product of q k.Transposed to produce matmul_qk,
then scaled matmul_qk with the square root of dk such that dk is cast as tf.shape(k)[-1] dtype=tf.float3,
and then scaled the attention_logits to matmul_qk/tf.math.sqrt(dk)
I am not sure if any of the above steps are wrong or not, but the following I assume I have implemented correctly where the mask provided is first subtracted by 1 as in (1. - mask ) and then multiplied to 1e9 if the mask exists, after which we add it to the scaled attention weights,
I then fed it to the softmax activation and then multiplied the product with v and returned it

for more info here is my output :

Nevermnd · September 2, 2024, 10:42pm

@HASHEM_JABER I was getting the same results as you too at one point… Trying to remember what I did to fix it. Keep in mind you want to use matmul and not the dot product in this case.

Same with your output.

Also keep in mind it is -1e9 not 1e9.

HASHEM_JABER · September 2, 2024, 10:54pm

@Nevermnd Thank you soooo much, the issue was indeed 1e9 being in the positive sign not the negative, thanks for the heads up sir !!!

Topic		Replies	Views
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4	6	65	September 6, 2024
C4W2_Assignment Atention modelshelp NLP with Attention Models week-2	1	12	April 23, 2025
W4 A1 \| Ex-3 \| Scaled Dot Product Attention Sequence Models	27	3209	March 24, 2025
Transformer Summarizer C4W2_Assignment Exercise 1 - scaled_dot_product_attention NLP with Attention Models week-2	3	214	April 30, 2024
Week 4 Scaled Dot Product Attention Sequence Models	10	804	October 31, 2021

C4W2_Assignment in Natural Language Processing with Attention

Related topics