Hi @pongyuenlam
First of thank you for putting efforts from your side to understanding what you are trying to learn. It feels great as a mentor when we come across learners who also put sincere effort. Once you complete NLP, do complete the Deep Learning Specialisation. Believe me you will not regret.
Now comes to your code, I will go step wise.
-
after calculate self attention for block 1(codes were correct for that), you had to apply layer normalization (layernorm1) to the sum of the attention output and the input
You used correct self.layer recall but there are two mistake, you didn’t required to use tf.add and second mistake use the simple method of addition to sum attention output and x but you have used ((mult_attn_out1, x) -
Read the instruction for applying layer normalisation here you had to apply layer normalization to the sum of the attention output and the output of the first block but you have basically summed up attention output1 and attention output2 which is incorrect here. Also the same mistake mentioned in point 1, not to use tf.add and use the addition operator to add the mult_attn_out2 and Q1 which is the output of the first block.
-
BLOCK3. Next instruction mentioned was pass the output of the second block through a ffn, but added a padding mask to the block3 which was not required here.
-
next the code instruction mentions
apply a dropout layer to the ffn output
usetraining=training
But you missed adding training=training while applying dropout layer to the fun output -
Again to apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block, please remove the tf.add and use addition operator to the mentioned output in instruction which you chose correctly.
Regards
DP