C4W2 Assignment DecoderLayer

My code for DecoderLayer returns right shapes for attenstion blocks 1 and 2 (the test your function code returns correct answer) but the unit test: w2_unittest.test_decoderlayer(DecoderLayer, create_look_ahead_mask) gives an error:
Failed test case: Wrong values in ‘out’.
Expected: [1.1810006, -1.5600019, 0.41289005, -0.03388882]
Got: [-0.61175704 -0.9107513 -0.14352934 1.6660377 ]

Failed test case: Wrong values in ‘out’ when we mask the last word. Are you passing the padding_mask to the inner functions?.
Expected: [1.1297308, -1.6106694, 0.32352272, 0.15741566]
Got: [-0.15456593 -1.0511776 -0.43255612 1.6382997 ]

I would appreciate any hints :slight_smile:

Ok. Solved it :slight_smile:

1 Like

Nice work! Thanks for letting us know …

Thank you. Unfortunately, my joy did not last very long because I encountered a problem with grading:
There was a problem compiling the code from your notebook. Details:
Exception encountered when calling layer ‘softmax_3’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [1,2,2,150] vs. [1,1,1,2] [Op:AddV2] name:

Call arguments received by layer ‘softmax_3’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

same as in this thread C4W2 cannot graded
I think my code for the second attention block follows the directions in this thread but the problem still occurs.
I would be grateful if you coud help me with this.

If your code works in the notebook, but fails the grader, it probably means it is not general in some way. E.g. references global variables only present in the notebook or the like.

If those hints are not enough to help, then it’s probably time to look at your code. We can’t do that in the public thread, but I’ll send you a DM about how to do that. It’s bedtime in my timezone (UTC -8), so if we go that route it will need to wait maybe 10 hours before I can respond.

Hi @Lidia_Opiola

Note, that when creating padding mask for the decoder’s second attention block - we use the encoder_input.

Isn’t this your mistake too?