I came across an issue with the encoding DecoderLayer function. I believe that I have everything correct however the Unit Test is failing with the error:
AssertionError: Wrong values in attn_w_b2. Check the call to self.mha2
However when I edit out the assert test in the Unit Test section it passes all tests and also I complete the exercise 100% according to the grader. I’m a little confused whether I actually have it correct.
I have the following code for the 2nd attention: attn2, attn_weights_block2 = self.mha2(enc_output, enc_output, out1, padding_mask, return_attention_scores=True)
Hi
I am also getting the same error,
Cell #20. Can’t compile the student’s code. Error: AssertionError(‘Wrong values in attn_w_b2. Check the call to self.mha2’)
But I dont see any issue with the code, but an error with the test code.
Can grader please check and response it once.
Thanks
Anand
I am having the same problem as the OP. If I comment out the DecoderLayer assert, then the Decoder tests pass. If I uncomment DecoderLayer and modify DecoderLayer to remove the error, then the Decoder tests fail. There appears to be discrepancy between the mh2 call in this assignment and the corresponding call on which it is clearly based at this site:
I am not sure which is correct, because based on the comments in the exercise and the documentation for MultiHeadAttention layer, the exercise should be correct. However, if I change my code to match that at the tensorflow.org page, then it flows thru except for your assert error.
EDIT: Like OP, if I edit out the offending asserts in the DecoderLayer_test block, I can send the entire assignment to the grader and receive 100/100.
I think the issue is that the order of the parameters passed to the mha call differ between the assignment (which uses TensorFlow’s built-in MultiHeadedAttention class) and the “Transformer model for language understanding” webpage referenced here (which implements its own version of the same class).
Nonetheless, even after fixing that error, I still run into the dreaded ‘outd’ assert issue at the Decoder stage
EDIT: I tried to use the “incorrect” parameter order and comment out the unit test as suggested above, but I ran into the “Cell 20: can’t compile the student’s code” error, which failed the autograder.
I commented out the following two assertion statement, after which I also was able to get a 100 on the assignment. #assert np.allclose(attn_w_b2[0, 0, 1], [0.34485385, 0.33230072, 0.32284543]), “Wrong values in attn_w_b2. Check the call to self.mha2” #assert np.allclose(out[0, 0], [0.64775777, -1.5134472, 1.1092964, -0.24360693]), “Wrong values in out”
So, yes, either there is something wrong with the assertion statements and it is correct to follow the TensorFlow guide verbatim, or we are bypassing an error that should be flagged and the self.mha2 call is different from the guide.
in TF’s documentation about MultiHeadAttention layer, tensors passed through the layer are: query, value, key, attention_mask. (Note the order of the parameters in call argument, “query” comes before “value” and “key”).
I passed the mha2 parameters via “query=out1,value=enc_output,key=enc_output” so that correct order according to the lecture is ensured and got same issues with full decoder test unit.
Just passing mha2 parameters via “self.mha2( enc_output, enc_output, out1, …” helps with that issue (after commenting out 2 lines in the DecoderLayer_test – the grader doesn’t seem to work if the tests fail.)
However, In the Decoder_test the line
assert np.allclose(outd[1, 1], [-0.34560338, -0.8762897, -0.4767484, 1.6986415]), “Wrong values in outd”)
is now OK (& the grader will give 100/100.)
=> Consequently, that line in Decoder_test is based on passing the args wrongly (or at least inconsistent with the lecture).
I think those test values should be updated by the mentors.
Thanks to all for uncovering the parameter issue. Was a rather tedious bug!
Hi,
I am stucked also with exercice 6 and even if I swapped the positions out1, enc_output, enc_output in self.mha2 I still have the same problem with the Unit test for Decoder_layer.
The deadline is on Monday, I do not know what to do ?
# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1 (~1 line)
attn1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask, return_attention_scores=True)
# BLOCK 2
# calculate self-attention using the Q from the first block and K and V from the encoder output.
# Return attention scores as attn_weights_block2 (~1 line)
attn2, attn_weights_block2 = self.mha2(out1, enc_output, enc_output, padding_mask, return_attention_scores=True)
I have unlisted this thread because its contents are 2.5 years old, and have become obsolete (it recommended modifying the public_utils.py file, which is not a good idea).