W4 A 1 | Ex- 4 | Encoder Layer

Hi, I’m really frustrated with this week’s exercises and I can’t find really helpful instructions. The syntax of this exercise make so little sense to me…

(Solution code removed by staff)

I got assertion error for wrong values. Please could someone help me?

Hey @Fangyi

I’m really sorry to hear that you had a difficult time with this week assignment.

Check that you use an attention mask in your attention layer.

Please, remove your code from the post, because it’s against the rules.

Oh, sorry about the code. Thanks for the tipp.

Hello. I am stumbling a long on this assignment as well. Fortunately, the documentation on this is really well. Unfortunately, directly following the document from tensforflow, i am still getting issues. I have everything filled out so it can attempt to run from start to finish. I am just stuck on :

class EncoderLayer(tf.keras.layers.Layer):

Class of code. Im not quite sure what the error means yet. The additonal hints mentions this:

The __init__ method creates all the layers that will be accesed by the the call method. Wherever you want to use a layer defined inside the __init__ method you will have to use the syntax self.[insert layer name]

which when I try putting the method name into the [ ], it fails for a different error. What does that hint intend for us to do? Here is the tail of my error message:

InvalidArgumentError: Incompatible shapes: [1,2,3,3] vs. [1,1,3,4] [Op:AddV2]

I have been looking online for that error, but none of the examples posted in forums are similar to the problem i am solving. I just don’t get it. I followed the tensorflow tutorial on transformers. The code has all the same lines, assigns all the same variables. The parameters are a bit different, but I made the necessary changes. No copypasta. Here is the code I am running:

Ok. So i got somewhere with the encoding_layer. I don’t know why it makes a difference, but it seems like i am on the right track, because i am now failing because of one of the unit tests: “Wrong Values”. I had to get rid of the “np.sum()” and just simply “+” them together. I also got ride of the “_” variable on the first line of code where it first initializes ‘attn_output’. I hope i am on the right direction on not just find a quirky way to do it.


Ok found the stoopid mistake i made. Just an oversight on reading through things.

1 Like

Hi, after attempting to solve the encoder layer part of the assignment multiple times I still couldn’t figure out how to remove this error.

AssertionError Traceback (most recent call last)
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True

I made sure to use training=training in both the dropout layers but couldn’t get to the solution.


Hi @Aadesh_Upadhyay ,have u remove this error ?
I’m stuck here

Hi @Fangyi I got the same error as following

Could you please tell me how to use the mask in the attention layer? I am stuck here for a long time. :pleading_face:

1 Like

I solved this by correctly add the arguments in the

[code removed - moderator]

Check that you have used all the arguments, including mask and training

Hello all!

Here’s a useful article that explains on ‘how and why we are using using ‘masking’ and then training the layers step by step’ especially in the attention mode.

We have to learn that first why masking is an important requisite?

The notebook C5_W4_A1_Transformer_Subclass_v1
thoroughly explains that all, but this article will also provide you with other generalisations, important in bringing up the functioning to the core.

I had the same problem (Wrong values) and I solved it after paying close attention to the instructions. Every instruction (e.g., sum, q, k, v must be same, use training…) matter here to get the solution right. “Attention is all you need!”

I also get this error. Am completely stumped.

Ok, I figured it out. I hadn’t seen the additional hints help in the lab. Reading the doc for MultiHeadAttention and the call arguments.

This Weeks assignment is a nightmare, I can work out what the instructions want me to do, but I have zero understanding of why any of this is being done. Watched the videos 3 times, and they leave me with more questions than answers.


Hello all!
I am getting the same error and am not able to figure it out even after trying for a while now.
Can someone please give me the hint for this?