C4W2_Assignment - Ex 7 Decoder Layer output

Hello everyone, everything good? Guys, I have a problem in the week 2 exercise, more specifically in exercise 7 of the decoder layer. I can’t find the reasons why I have the problem. I have already tried to understand and in dozens of forums to find a way to solve my problem. At this moment, I don’t understand the reasons, everything seems right to me. Could anyone of you help me resolve this?

Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]

Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]

Hi @efroes,

I was not able to exactly recreate the error you are getting, but I managed to get similar errors (same error messages, a bit different values). They occured either when Q1 was not computed as the correct sum, or when the first argument in the application of self.mha2 (in block 2) was not the right one.

Assuming that your computation of
mult_attn_out1, attn_weights_block1 = ...
is correct (since you don’t any errors about attn_w_b1), the error is either in the computation of Q1, or in
mult_attn_out2, attn_weights_block2 =...

Q1 when the code is completed correctly is:

tf.Tensor(
[[[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684855 -0.43661404
0.9106817 -0.91232944 0.621634 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.0684855 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.3574357 -0.6673889 -2.068485 -0.4366145
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767969 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.4366145
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370359
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.66738844 -2.068485 -0.43661404
0.9106817 -0.91232944 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767974 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866474 0.6039038 ]
[ 1.1767974 -0.3574357 -0.6673889 -2.0684853 -0.43661404
0.9106822 -0.91232944 0.621634 -0.5611186 0.00370359
1.6866465 0.6039038 ]
[ 1.1767979 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106817 -0.9123292 0.6216345 -0.5611186 0.00370359
1.6866465 0.6039038 ]
[ 1.1767969 -0.35743523 -0.6673889 -2.0684853 -0.43661404
0.9106817 -0.91232896 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]
[ 1.1767979 -0.35743523 -0.6673889 -2.0684853 -0.4366145
0.9106822 -0.91232896 0.6216345 -0.5611186 0.00370312
1.6866469 0.6039038 ]]], shape=(1, 15, 12), dtype=float32)

You can check if yours is the same to better locate the error.

Edit: it is Exercise 2 - DecoderLayer in the section 7.1 - Decoder Layer, right? (Not exercise 7)

Hi @Anna_Kay, after some hours I could solve the problem. The problem really is small detail the we should to do to solve the Ex7. Thk you so much!

1 Like

Hello @efroes @Anna_Kay
I am getting same error, How did you get the solution to this issue.

Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]

Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]

Hi @Sayed_Shahid_Hussain!

I did not encouter this error myself, nor did I manage to recreate it, so I cannot directly point you to the solution.

Perhaps checking the value of Q1 that you should be getting (check the post above) will help you locate the line of the error (if it occurs before or after).

Best

1 Like

Did you get any other error?

it is stating you have missed padding mask, so check if self.mha2 has the mask recall correctly.

Yes I have passed padding mask to the func, yet haven’t resolved the issue. Although function is running correctly and matching the expected output. However failed to run unit test and getting this error as given above.

is this error for DecoderLayer grader cell?

@Sayed_Shahid_Hussain

you can share a screenshot of the DecoderLayer coder cell by personal DM, so once your code could be cross checked.

1 Like

Thank you @Anna_Kay I am trying to debug the error.

Alright thanks for your reponse. I will share in a while.

Hello @Sayed_Shahid_Hussain

Issues with your code

  1. For Block1 and Block 2, it is clearly mention Dropout will be applied during training (~1 line).
    so you adding to dropout to mult_attn_out1 and mult_attn_otu2 was not required (those two steps need to be removed)

  2. training is only used for instructions mentioned apply a dropout layer to the ffn output, no where else( so remove any other place if you have used)

  3. for me code line
    apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
    Q1
    But you using Q1 as skip1 is not error but I would advise you to write codes only were None was mentioned or asked to write code as for this cell if you notice the next block2 code it mentions
    calculate self-attention using the Q(this Q is from Q1) from the first block, and you renaming to something of your own name can create issue while debugging… as I can see this renaming could have caused you the next major errors

  4. While recalling
    pass the output of the second block through a ffn
    ffn_output = self.ffn(skip2)
    your skip2 is layer normalization to the sum of the attention output and the output of the first block, but the instruction mentions you to pass to output of second block, i.e. mult_attn_out2 but you have used skip2 which is incorrect

  5. Your code line
    apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block
    out3 = self.layernorm3(skip2 + ffn_output)
    Instruction mentions you to use fan_output and output of the second block i.e. mult_attn_out2 but you have used skip2 which is incorrect.

Regards
DP

1 Like

Hello @Deepti_Prasad

Thank you so much for taking the time to review my code and provide detailed instructions on how to address the issues with the DecoderLayer class error. I truly appreciate your support.
I will diligently work through your instructions to ensure that the issues are resolved according to your guidance. Your assistance is invaluable to me, and I am grateful for your expertise.

Best

Let me know once issue is resolved.

Keep Learning!!!

Regards
DP

1 Like