Hello everyone, everything good? Guys, I have a problem in the week 2 exercise, more specifically in exercise 7 of the decoder layer. I can’t find the reasons why I have the problem. I have already tried to understand and in dozens of forums to find a way to solve my problem. At this moment, I don’t understand the reasons, everything seems right to me. Could anyone of you help me resolve this?
Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]
Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]
I was not able to exactly recreate the error you are getting, but I managed to get similar errors (same error messages, a bit different values). They occured either when Q1 was not computed as the correct sum, or when the first argument in the application of self.mha2 (in block 2) was not the right one.
Assuming that your computation of mult_attn_out1, attn_weights_block1 = ...
is correct (since you don’t any errors about attn_w_b1), the error is either in the computation of Q1, or in mult_attn_out2, attn_weights_block2 =...
Hello @efroes@Anna_Kay
I am getting same error, How did you get the solution to this issue.
Failed test case: Wrong values in ‘attn_w_b2’. Check the call to self.mha2.
Expected: [0.34003818, 0.32569194, 0.33426988]
Got: [0.34083953 0.32673767 0.33242285]
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [ 1.3311304 -1.4207214 0.365438 -0.275847 ]
Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [ 1.3888907 -1.414115 0.2009444 -0.17572011]
I did not encouter this error myself, nor did I manage to recreate it, so I cannot directly point you to the solution.
Perhaps checking the value of Q1 that you should be getting (check the post above) will help you locate the line of the error (if it occurs before or after).
Yes I have passed padding mask to the func, yet haven’t resolved the issue. Although function is running correctly and matching the expected output. However failed to run unit test and getting this error as given above.
For Block1 and Block 2, it is clearly mention Dropout will be applied during training (~1 line).
so you adding to dropout to mult_attn_out1 and mult_attn_otu2 was not required (those two steps need to be removed)
training is only used for instructions mentioned apply a dropout layer to the ffn output, no where else( so remove any other place if you have used)
for me code line
apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
Q1
But you using Q1 as skip1 is not error but I would advise you to write codes only were None was mentioned or asked to write code as for this cell if you notice the next block2 code it mentions
calculate self-attention using the Q(this Q is from Q1) from the first block, and you renaming to something of your own name can create issue while debugging… as I can see this renaming could have caused you the next major errors
While recalling
pass the output of the second block through a ffn
ffn_output = self.ffn(skip2)
your skip2 is layer normalization to the sum of the attention output and the output of the first block, but the instruction mentions you to pass to output of second block, i.e. mult_attn_out2 but you have used skip2 which is incorrect
Your code line
apply layer normalization (layernorm3) to the sum of the ffn output and the output of the second block
out3 = self.layernorm3(skip2 + ffn_output)
Instruction mentions you to use fan_output and output of the second block i.e. mult_attn_out2 but you have used skip2 which is incorrect.
Thank you so much for taking the time to review my code and provide detailed instructions on how to address the issues with the DecoderLayer class error. I truly appreciate your support.
I will diligently work through your instructions to ensure that the issues are resolved according to your guidance. Your assistance is invaluable to me, and I am grateful for your expertise.