Hello. I have managed to implement the calculation of the attention output (mult_attn_out2) of the second MHA block successfully, at least they pass the unit tests, but the values in the output of the feed forward network are wrong according to the final unit test. So something is going wrong with the add+norm layer of the second MHA block, the FFN layer, the dropout layer, the final add+norm layer, or some combination of all these. But I cannot find the problem. My code is attached below, hopefully someone can see what I cannot.
DecoderLayer call method code
mentor edit: code removed
Error Message
AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 DecoderLayer_test(DecoderLayer, create_look_ahead_mask)~/work/W4A1/public_tests.py in DecoderLayer_test(target, create_look_ahead_mask)
180 assert np.allclose(attn_w_b1[0, 0, 1], [0.5271505, 0.47284946, 0.], atol=1e-2), “Wrong values in attn_w_b1. Check the call to self.mha1”
181 assert np.allclose(attn_w_b2[0, 0, 1], [0.32048798, 0.390301, 0.28921106]), “Wrong values in attn_w_b2. Check the call to self.mha2”
→ 182 assert np.allclose(out[0, 0], [-0.22109576, -1.5455486, 0.852692, 0.9139523]), “Wrong values in out”
183
184AssertionError: Wrong values in out