C4W2_Assignment - Ex 7 Decoder Layer output

efroes · March 11, 2024, 4:38am

Hello everyone, everything good? Guys, I have a problem in the week 2 exercise, more specifically in exercise 7 of the decoder layer. I can’t find the reasons why I have the problem. I have already tried to understand and in dozens of forums to find a way to solve my problem. At this moment, I don’t understand the reasons, everything seems right to me. Could anyone of you help me resolve this?

Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]

Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]

Anna_Kay · March 12, 2024, 8:27am

Hi @efroes,

I was not able to exactly recreate the error you are getting, but I managed to get similar errors (same error messages, a bit different values). They occured either when Q1 was not computed as the correct sum, or when the first argument in the application of self.mha2 (in block 2) was not the right one.

Assuming that your computation of
mult_attn_out1, attn_weights_block1 = ...
is correct (since you don’t any errors about attn_w_b1), the error is either in the computation of Q1, or in
mult_attn_out2, attn_weights_block2 =...

Q1 when the code is completed correctly is:

tf.Tensor(
[[[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684855 -0.43661404
0.9106817 -0.91232944 0.621634 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.0684855 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.3574357 -0.6673889 -2.068485 -0.4366145
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767969 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.4366145
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866474 0.6039038 ]
[ 1.1767974 -0.3574357 -0.6673889 -2.0684853 -0.43661404
0.9106822 -0.91232944 0.621634 -0.5611186 0.00370359
1.6866465 0.6039038 ]
[ 1.1767979 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866465 0.6039038 ]
[ 1.1767969 -0.35743523 -0.6673889 -2.0684853 -0.43661404
0.9106817 -0.91232896 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767979 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106822 -0.91232896 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]]], shape=(1, 15, 12), dtype=float32)

You can check if yours is the same to better locate the error.

Edit: it is Exercise 2 - DecoderLayer in the section 7.1 - Decoder Layer, right? (Not exercise 7)

efroes · March 15, 2024, 6:45pm

Hi @Anna_Kay, after some hours I could solve the problem. The problem really is small detail the we should to do to solve the Ex7. Thk you so much!

Sayed_Shahid_Hussain · April 3, 2024, 6:55am

Hello @efroes @Anna_Kay
I am getting same error, How did you get the solution to this issue.

Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]

Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]

Anna_Kay · April 3, 2024, 2:32pm

Hi @Sayed_Shahid_Hussain!

I did not encouter this error myself, nor did I manage to recreate it, so I cannot directly point you to the solution.

Perhaps checking the value of Q1 that you should be getting (check the post above) will help you locate the line of the error (if it occurs before or after).

Best

Deepti_Prasad · April 3, 2024, 2:45pm

Did you get any other error?

it is stating you have missed padding mask, so check if self.mha2 has the mask recall correctly.

Sayed_Shahid_Hussain · April 3, 2024, 3:33pm

Yes I have passed padding mask to the func, yet haven’t resolved the issue. Although function is running correctly and matching the expected output. However failed to run unit test and getting this error as given above.

Deepti_Prasad · April 3, 2024, 3:35pm

is this error for DecoderLayer grader cell?

@Sayed_Shahid_Hussain

you can share a screenshot of the DecoderLayer coder cell by personal DM, so once your code could be cross checked.

Sayed_Shahid_Hussain · April 3, 2024, 3:37pm

Thank you @Anna_Kay I am trying to debug the error.

Sayed_Shahid_Hussain · April 3, 2024, 3:38pm

Alright thanks for your reponse. I will share in a while.

Deepti_Prasad · April 4, 2024, 9:05am

Hello @Sayed_Shahid_Hussain

Issues with your code

For Block1 and Block 2, it is clearly mention Dropout will be applied during training (~1 line).
so you adding to dropout to mult_attn_out1 and mult_attn_otu2 was not required (those two steps need to be removed)
training is only used for instructions mentioned apply a dropout layer to the ffn output, no where else( so remove any other place if you have used)
for me code line
apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
Q1
But you using Q1 as skip1 is not error but I would advise you to write codes only were None was mentioned or asked to write code as for this cell if you notice the next block2 code it mentions
calculate self-attention using the Q(this Q is from Q1) from the first block, and you renaming to something of your own name can create issue while debugging… as I can see this renaming could have caused you the next major errors
While recalling
pass the output of the second block through a ffn
ffn_output = self.ffn(skip2)
your skip2 is layer normalization to the sum of the attention output and the output of the first block, but the instruction mentions you to pass to output of second block, i.e. mult_attn_out2 but you have used skip2 which is incorrect
Your code line
apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block
out3 = self.layernorm3(skip2 + ffn_output)
Instruction mentions you to use fan_output and output of the second block i.e. mult_attn_out2 but you have used skip2 which is incorrect.

Regards
DP

Sayed_Shahid_Hussain · April 4, 2024, 9:34am

Hello @Deepti_Prasad

Thank you so much for taking the time to review my code and provide detailed instructions on how to address the issues with the DecoderLayer class error. I truly appreciate your support.
I will diligently work through your instructions to ensure that the issues are resolved according to your guidance. Your assistance is invaluable to me, and I am grateful for your expertise.

Best

Deepti_Prasad · April 4, 2024, 9:47am

Let me know once issue is resolved.

Keep Learning!!!

Regards
DP