C5 W4 UNQ_C4 Wrong values when training=True

class EncoderLayer(tf.keras.layers.Layer):
{mentor edit: code removed}

This is my code I am getting this error


AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True

You have added a dropout layer that isn’t needed.
Remove your first dropout layer,.

1 Like

Thank you so much, it worked.

Sorry but I got the same error and I was stuck on this for hours. The reason why I added the dropout was because of the instruction.
The line # calculate self-attention using mha(~1 line). Dropout will be applied during training
Why is dropout mentioned here?

The instruction is trying to tell you to not add a separate dropout layer here in the self-attention layer, because the Keras MultiHeadAttention layer implements that for you. It is configured using the dropout_rate argument that is provided in the Encoder() constructor.

@TMosh Hi Tom,
I tried to removed the first drop out by using
attn_output = self.mha(x, x, training=False)
it is still not working. I am getting the same error:
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), "Wrong values when training=True"
** 95 **
** 96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))**

AssertionError: Wrong values when training=True

Your self.mha(…) call doesn’t have enough arguments.

1 Like

And, “training=False” should not be one of them at all.

1 Like

@TMosh Hi Tom,

I do not know how much I can thank you :smiley:
I did not include key because it is stated as “optional”.
Anyways, I have added the key and the mask and it is working now.

Thank you for your help! :star2:

1 Like

@TMosh Hi Tom,

I a pretty sure, that I got everything right, but nontheless I still get the same error.

“”"
AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True
“”"

I added the training component to dropout_ffn and mha!
self.mha(query=x, value=x, key=x, attention_mask=mask, training=training)
self.dropout_ffn(ffn_output, training=training)

With mha i was not sure since the description states to leave the defaults

" Let the default values for return_attention_scores and training . You will also perform Dropout in this multi-head attention layer during training."

In my opinion these two sentences are contrary, but it does not matter since with or without the training argument in mha I still get the same error message…

Kinda frustrated on this end of the line…any help would be gladly appreciated

4 Likes

{edited}
Tip: Be careful what variable name you use in self.layernorm2() for the “output from multi-head attention”. The naming convention for the variables is misleading.

4 Likes

Thanks, both of the posts above helped me debug where I was mistaken in my code.
Thanks @TMosh
Thanks @FredericM

1 Like

To add to this discussion, you have to read the comments carefully to know which variables they’re talking about.

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = ...  # (batch_size, input_seq_len, fully_connected_dim)

I misunderstood this comment until I carefully read the comment above skip_x_attention.

The multi-head attention layer is not the same as the multi-head attention block. The layer consists of the multi-head attention block followed by the layer normalization block. The output of those two operations in sequence is passed to the feedforward layer (see Fig 2a in the assignment).

self.mha_output is the output of the block but skip_x_attention is the output of the layer

Knowing this distinction would have saved me a lot of time, so hopefully it helps anyone who makes the same mistake I did.

8 Likes

This was a very helpful comment. Not sure that I would have noticed this on my own. Thanks!

1 Like

Thank you for making this comment.
What help me was knowing that I should pass the output of skip_x_connection to the ffn layer.
I was earlier trying to pass mha_output layer to the ffn layer which is incorrect as shown on the diagram.
Thank you

1 Like