[Week4]C5_W4_A1_Transformer_Subclass_v1 Exercise 4 - EncoderLayer

xiayuhu · October 7, 2021, 12:43pm

always getting following error, what might be wrong?

TMosh · October 7, 2021, 10:37pm

One of your layers seems to be missing the “training=training” argument.

xiayuhu · October 11, 2021, 1:57pm

Thanks for the reply. I do have “training=training” in every layer. What else might be wrong.
Even weird, it passed tested two days ago, now showing error, i didn’t change the code.

liuzifeng · July 10, 2022, 3:49am

hi @xiayuhu have u fixed this ?

Rashmi · July 18, 2022, 10:15am

Hi, Liuzifeng.

Please go through this link. It might solve your query. Let us know, in case, it doesn’t suffice to your query. We can always have it discuss here.

Ebru_Hogur · June 7, 2023, 10:27pm

I am still having this issue I was wondering if I could get help here.

saifkhanengr · June 8, 2023, 4:59am

I added these print statements after every relevant line:

print(f'self_mha_output:{self_mha_output}')
print(f'skip_x_attention:{skip_x_attention}')
print(f'ffn_output:{ffn_output}')
print(f'ffn_output after dropout:{ffn_output}')
print(f'encoder_layer_out:{encoder_layer_out}')

Here is the first set of output:

self_mha_output:[[[ 0.2629684   0.5438655  -0.47695604  0.43180236]
  [ 0.27214473  0.5516315  -0.47251672  0.44105405]
  [ 0.2637157   0.5352751  -0.46818826  0.44008902]]]
skip_x_attention:[[[ 0.7840514  -0.9639456  -1.0145587   1.1944535 ]
  [-1.2134784   1.0835364  -0.7550787   0.885021  ]
  [ 0.76012594 -0.20960009 -1.545446    0.9949202 ]]]
ffn_output:[[[-0.40299335 -0.26304182  0.01199517  0.77515805]
  [ 0.11928089  0.02366283  0.21244505  0.6133719 ]
  [-0.47993705 -0.35966852  0.11620045  0.9476139 ]]]
ffn_output after dropout:[[[-0.44777042 -0.2922687   0.01332797  0.86128676]
  [ 0.13253433  0.02629204  0.23605007  0.6815244 ]
  [-0.5332634  -0.3996317   0.          1.0529044 ]]]
encoder_layer_out:[[[ 0.23017097 -0.9810039  -0.78707564  1.5379086 ]
  [-1.2280797   0.76477575 -0.7169284   1.1802323 ]
  [ 0.14880148 -0.4831803  -1.1908401   1.5252188 ]]]

You may try this and see where your values are not matching with mines. That is the place to look out. But make sure you add these statements in the correct places.

Best,
Saif.

Ebru_Hogur · June 8, 2023, 5:20am

Thanks! I found the problem .

ddonkiszot · July 30, 2023, 4:26pm

thanks for that hint, it helped me to debug

John_Murphy1 · August 23, 2023, 1:11pm

Hi. I am having a similar problem with C5_W4 Exercise 4, EncoderLayer.
I added the print statements from your comment and I have narrowed down the issue.
My problem is with encoder_layer_out. Instead of the values you listed:
encoder_layer_out:[[[ 0.23017097 -0.9810039 -0.78707564 1.5379086 ]
[-1.2280797 0.76477575 -0.7169284 1.1802323 ]
[ 0.14880148 -0.4831803 -1.1908401 1.5252188 ]]]

My values are:
encoder_layer_out:[[[-0.61228544 0.04123101 -1.0298333 1.6008877 ]
[-0.12886912 0.22834736 -1.4508611 1.351383 ]
[-0.64349914 -0.11383337 -0.9031621 1.6604947 ]]]

When applying the layer normalization 2, I have included the self_mha_output + ffn_output and the training=training.
Any suggestion on where I can research this to address this issue?
Thanks.
John

saifkhanengr · August 23, 2023, 1:20pm

Hello John!

How you are implementing this step?

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)

John_Murphy1 · August 23, 2023, 1:22pm

WOw - that was fast. I fixed that part and just updated my message. My problem was as you suggested with fnn_output. I fixed that. See my updated message as the problem is now with encoder_layer_out.
Thanks

John_Murphy1 · August 23, 2023, 1:25pm

And I got it…
My layer normal2 had the wrong first parameter.

Saifkhanegr - thanks. THe print statement suggestions helped me narrow it down.
Thanks!

saifkhanengr · August 23, 2023, 1:27pm

I am glad you solved it on your own.

Sebastian_Miranda · January 18, 2024, 7:48pm

Hi, thank for the tips it really help to visualise what is going on.
My problem is whit the Normalization 1.

My values are
skip_x_attention:[[[ 0.5773487 -1.7320461 0.5773487 0.5773487]
[-1.7320461 0.5773487 0.5773487 0.5773487]
[ 0.999998 -0.999998 -0.999998 0.999998 ]]]
When yours is
skip_x_attention:[[[ 0.7840514 -0.9639456 -1.0145587 1.1944535 ]
[-1.2134784 1.0835364 -0.7550787 0.885021 ]
[ 0.76012594 -0.20960009 -1.545446 0.9949202 ]]]

I’m not really finish to understand how the parameters are pass to this function. I tried to pass the Input x and the self_mha_output but ether of them give me the same output than yours. And the function only accept one parameter as input so I was left without option.

TMosh · January 18, 2024, 7:53pm

The value you pass is the sum of x and self_mha_output.

You can see this from the figure that shows the encoder layer, where it says “Add & Norm”.

Sebastian_Miranda · January 18, 2024, 7:56pm

Ahhh thank you very much, I was thinking that the other parameter was pass by other way that I didn’t understand and the sum was do it internally in the function.

mrjohnson · August 17, 2024, 2:16am

Found this to be super-useful. Now I know my issue comes with skip_x_attention, which is not where I was thinking:

self_mha_output:[[[ 0.2629684 0.5438655 -0.47695604 0.43180236]
[ 0.27214473 0.5516315 -0.47251672 0.44105405]
[ 0.2637157 0.5352751 -0.46818826 0.44008902]]]
skip_x_attention:[[[ 1.2629684 0.5438655 0.523044 1.4318024 ]
[ 0.27214473 1.5516315 0.5274833 1.4410541 ]
[ 1.2637157 0.5352751 -0.46818826 1.440089 ]]]

I’m stuck here, and surprised because this step seemed straight forward:

Now add a skip connection by adding your original input x and the output of the your multi-head attention layer.

We’re just talking about adding x and self_mha_output aren’t we?

(Also I’ve tried both + and tf.keras.layers.Add() just in case, but it didn’t seem to make a difference.)

Thank you

TMosh · August 17, 2024, 2:20am

Hint:

(x + self_mha_output)

And you pass it through one of the self.layernorm?(…) methods.

mrjohnson · August 17, 2024, 9:09pm

Thank you for your help!

I misinterpreted the instruction to not do that until the next step, which then led me to cascading confusion.

Topic		Replies	Views
Week 4_Ex4_Transformer_Subclass_Encoderlayer Sequence Models week-4	17	289	March 4, 2024
C5 W4 A1: Wrong values when training=True Sequence Models	2	656	February 4, 2022
C5W4: Transformer Network Sequence Models	4	965	May 30, 2023
C5_W4_A1_Transformer Exercise 4 - EncoderLayer Sequence Models	2	836	February 15, 2022
DLS C5 week4 exercise4 - EncoderLayer Sequence Models week-4	4	271	February 6, 2024

[Week4]C5_W4_A1_Transformer_Subclass_v1 Exercise 4 - EncoderLayer

Related topics