Wrong value error in UNQ_C6

The automatic checker fails with:
AssertionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_37688\3686812772.py in
----> 2 DecoderLayer_test(DecoderLayer, create_look_ahead_mask)

c:\gilad\my courses\coursera\Reccurent Neural Networks\W4A1\public_tests.py in DecoderLayer_test(target, create_look_ahead_mask)
178 assert tuple(tf.shape(out).numpy()) == q.shape, f"Wrong shape. We expected {q.shape}"
→ 180 assert np.allclose(attn_w_b1[0, 0, 1], [0.5271505, 0.47284946, 0.], atol=1e-2), “Wrong values in attn_w_b1. Check the call to self.mha1”
181 assert np.allclose(attn_w_b2[0, 0, 1], [0.32048798, 0.390301, 0.28921106]), “Wrong values in attn_w_b2. Check the call to self.mha2”
182 assert np.allclose(out[0, 0], [-0.22109576, -1.5455486, 0.852692, 0.9139523]), “Wrong values in out”

AssertionError: Wrong values in attn_w_b1. Check the call to self.mha1

I double checked my code, but cannot find the problem.
Please help. Thx, Gilad

Please send me your code of this function in a private message. Click my name and message.

Always mention Assignment too in your header or explanation.

You need to check how you have recalled self.mha1? which you will use to get the correct values for attn_w_b1

Read these instructions

  1. Block 1 is a multi-head attention layer with a residual connection, and look-ahead mask. Like in the EncoderLayer, Dropout is defined within the multi-head attention layer.
    The first two blocks are fairly similar to the EncoderLayer except you will return attention_scores when computing self-attention

so make sure you have return_attention_scores=True, look_ahead_mask

Also usual mistake in this assignment is learner using training recall which is not required here for attn_w_b1


Thank you for sending me your code, @gilad.danini!
Your code for BLOCK 2 is incorrect. See the instructions again.

# calculate self-attention using the Q from the first block and K and V from the encoder output. 

Here, what is the encoder output? It’s not x. Also, for Block 2, do we need look_ahead_mask or padding_mask?

Got ya.
We put the look ahead mask only on the actual input to the decoder?

yes but for attn_w_b1

hope you are not mixing with block1 and block2. Re-read the instructions again and again, point by point, see if you have followed it as per instructions.

OK THX. I am in the final transformer implementation now.