C5 W4 UNQ_C4 Wrong values when training=True

chiragbm · August 3, 2021, 1:02pm

class EncoderLayer(tf.keras.layers.Layer):
{mentor edit: code removed}

This is my code I am getting this error

AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True

TMosh · August 3, 2021, 4:19pm

You have added a dropout layer that isn’t needed.
Remove your first dropout layer,.

chiragbm · August 4, 2021, 12:05pm

Thank you so much, it worked.

Shilpi_Chaudhuri · October 23, 2021, 2:16pm

Sorry but I got the same error and I was stuck on this for hours. The reason why I added the dropout was because of the instruction.
The line # calculate self-attention using mha(~1 line). Dropout will be applied during training
Why is dropout mentioned here?

TMosh · October 25, 2021, 12:51am

The instruction is trying to tell you to not add a separate dropout layer here in the self-attention layer, because the Keras MultiHeadAttention layer implements that for you. It is configured using the dropout_rate argument that is provided in the Encoder() constructor.

Amy_Xu · December 18, 2021, 9:40pm

@TMosh Hi Tom,
I tried to removed the first drop out by using
attn_output = self.mha(x, x, training=False)
it is still not working. I am getting the same error:
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), "Wrong values when training=True"
** 95 **
** 96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))**

AssertionError: Wrong values when training=True

TMosh · December 18, 2021, 10:04pm

Your self.mha(…) call doesn’t have enough arguments.

TMosh · December 18, 2021, 10:04pm

And, “training=False” should not be one of them at all.

Amy_Xu · December 18, 2021, 10:46pm

@TMosh Hi Tom,

I do not know how much I can thank you
I did not include key because it is stated as “optional”.
Anyways, I have added the key and the mask and it is working now.

Thank you for your help!

FredericM · March 4, 2022, 3:24pm

@TMosh Hi Tom,

I a pretty sure, that I got everything right, but nontheless I still get the same error.

“”"
AssertionError Traceback (most recent call last)
in
1 # UNIT TEST
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
92 [[ 0.23017104, -0.98100424, -0.78707516, 1.5379084 ],
93 [-1.2280797 , 0.76477575, -0.7169283 , 1.1802323 ],
—> 94 [ 0.14880152, -0.48318022, -1.1908402 , 1.5252188 ]]), “Wrong values when training=True”
95
96 encoded = encoder_layer1(q, False, np.array([[1, 1, 0]]))

AssertionError: Wrong values when training=True
“”"

I added the training component to dropout_ffn and mha!
self.mha(query=x, value=x, key=x, attention_mask=mask, training=training)
self.dropout_ffn(ffn_output, training=training)

With mha i was not sure since the description states to leave the defaults

" Let the default values for return_attention_scores and training . You will also perform Dropout in this multi-head attention layer during training."

In my opinion these two sentences are contrary, but it does not matter since with or without the training argument in mha I still get the same error message…

Kinda frustrated on this end of the line…any help would be gladly appreciated

TMosh · March 4, 2022, 7:09pm

{edited}
Tip: Be careful what variable name you use in self.layernorm2() for the “output from multi-head attention”. The naming convention for the variables is misleading.

alireza-na77 · July 6, 2022, 7:42am

Thanks, both of the posts above helped me debug where I was mistaken in my code.
Thanks @TMosh
Thanks @FredericM

mrizk · December 16, 2022, 4:21am

To add to this discussion, you have to read the comments carefully to know which variables they’re talking about.

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = ...  # (batch_size, input_seq_len, fully_connected_dim)

I misunderstood this comment until I carefully read the comment above skip_x_attention.

The multi-head attention layer is not the same as the multi-head attention block. The layer consists of the multi-head attention block followed by the layer normalization block. The output of those two operations in sequence is passed to the feedforward layer (see Fig 2a in the assignment).

self.mha_output is the output of the block but skip_x_attention is the output of the layer

Knowing this distinction would have saved me a lot of time, so hopefully it helps anyone who makes the same mistake I did.

Matthew_Cochran · April 27, 2023, 7:34pm

This was a very helpful comment. Not sure that I would have noticed this on my own. Thanks!

Alamin_Ahmad · June 6, 2023, 12:26am

Thank you for making this comment.
What help me was knowing that I should pass the output of skip_x_connection to the ffn layer.
I was earlier trying to pass mha_output layer to the ffn layer which is incorrect as shown on the diagram.
Thank you

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_v1 UNQ4 Sequence Models coursera-platform	19	945	March 10, 2022
C5W4: Transformer Network Sequence Models coursera-platform	4	966	May 30, 2023
[Week 4] Exercise 4 Encoder Layer Sequence Models coursera-platform	2	590	December 17, 2021
C5W4A1: Error with EncoderLayer - AssertionError: Wrong values when training=True Sequence Models coursera-platform	4	1310	April 2, 2025
C5W4A1:EncoderLayer AssertionError: Wrong values when training=True Sequence Models coursera-platform	11	1336	September 19, 2022

C5 W4 UNQ_C4 Wrong values when training=True

Related topics