[Week4]C5_W4_A1_Transformer_Subclass_v1 Exercise 4 - EncoderLayer

always getting following error, what might be wrong?

One of your layers seems to be missing the “training=training” argument.

Thanks for the reply. I do have “training=training” in every layer. What else might be wrong.
Even weird, it passed tested two days ago, now showing error, i didn’t change the code.

1 Like

hi @xiayuhu have u fixed this ?

Hi, Liuzifeng.

Please go through this link. It might solve your query. Let us know, in case, it doesn’t suffice to your query. We can always have it discuss here.

1 Like

I am still having this issue I was wondering if I could get help here.

I added these print statements after every relevant line:

print(f'self_mha_output:{self_mha_output}')
print(f'skip_x_attention:{skip_x_attention}')
print(f'ffn_output:{ffn_output}')
print(f'ffn_output after dropout:{ffn_output}')
print(f'encoder_layer_out:{encoder_layer_out}')

Here is the first set of output:

self_mha_output:[[[ 0.2629684   0.5438655  -0.47695604  0.43180236]
  [ 0.27214473  0.5516315  -0.47251672  0.44105405]
  [ 0.2637157   0.5352751  -0.46818826  0.44008902]]]
skip_x_attention:[[[ 0.7840514  -0.9639456  -1.0145587   1.1944535 ]
  [-1.2134784   1.0835364  -0.7550787   0.885021  ]
  [ 0.76012594 -0.20960009 -1.545446    0.9949202 ]]]
ffn_output:[[[-0.40299335 -0.26304182  0.01199517  0.77515805]
  [ 0.11928089  0.02366283  0.21244505  0.6133719 ]
  [-0.47993705 -0.35966852  0.11620045  0.9476139 ]]]
ffn_output after dropout:[[[-0.44777042 -0.2922687   0.01332797  0.86128676]
  [ 0.13253433  0.02629204  0.23605007  0.6815244 ]
  [-0.5332634  -0.3996317   0.          1.0529044 ]]]
encoder_layer_out:[[[ 0.23017097 -0.9810039  -0.78707564  1.5379086 ]
  [-1.2280797   0.76477575 -0.7169284   1.1802323 ]
  [ 0.14880148 -0.4831803  -1.1908401   1.5252188 ]]]

You may try this and see where your values are not matching with mines. That is the place to look out. But make sure you add these statements in the correct places.

Best,
Saif.

1 Like

Thanks! I found the problem .

thanks for that hint, it helped me to debug

Hi. I am having a similar problem with C5_W4 Exercise 4, EncoderLayer.
I added the print statements from your comment and I have narrowed down the issue.
My problem is with encoder_layer_out. Instead of the values you listed:
encoder_layer_out:[[[ 0.23017097 -0.9810039 -0.78707564 1.5379086 ]
[-1.2280797 0.76477575 -0.7169284 1.1802323 ]
[ 0.14880148 -0.4831803 -1.1908401 1.5252188 ]]]

My values are:
encoder_layer_out:[[[-0.61228544 0.04123101 -1.0298333 1.6008877 ]
[-0.12886912 0.22834736 -1.4508611 1.351383 ]
[-0.64349914 -0.11383337 -0.9031621 1.6604947 ]]]

When applying the layer normalization 2, I have included the self_mha_output + ffn_output and the training=training.
Any suggestion on where I can research this to address this issue?
Thanks.
John

Hello John!

How you are implementing this step?

# pass the output of the multi-head attention layer through a ffn (~1 line)
ffn_output = None  # (batch_size, input_seq_len, fully_connected_dim)

WOw - that was fast. I fixed that part and just updated my message. My problem was as you suggested with fnn_output. I fixed that. See my updated message as the problem is now with encoder_layer_out.
Thanks

And I got it…
My layer normal2 had the wrong first parameter.

Saifkhanegr - thanks. THe print statement suggestions helped me narrow it down.
Thanks!

I am glad you solved it on your own.

1 Like

Hi, thank for the tips it really help to visualise what is going on.
My problem is whit the Normalization 1.

My values are
skip_x_attention:[[[ 0.5773487 -1.7320461 0.5773487 0.5773487]
[-1.7320461 0.5773487 0.5773487 0.5773487]
[ 0.999998 -0.999998 -0.999998 0.999998 ]]]
When yours is
skip_x_attention:[[[ 0.7840514 -0.9639456 -1.0145587 1.1944535 ]
[-1.2134784 1.0835364 -0.7550787 0.885021 ]
[ 0.76012594 -0.20960009 -1.545446 0.9949202 ]]]

I’m not really finish to understand how the parameters are pass to this function. I tried to pass the Input x and the self_mha_output but ether of them give me the same output than yours. And the function only accept one parameter as input so I was left without option.

The value you pass is the sum of x and self_mha_output.

You can see this from the figure that shows the encoder layer, where it says “Add & Norm”.

Ahhh thank you very much, I was thinking that the other parameter was pass by other way that I didn’t understand and the sum was do it internally in the function.

Found this to be super-useful. Now I know my issue comes with skip_x_attention, which is not where I was thinking:

self_mha_output:[[[ 0.2629684 0.5438655 -0.47695604 0.43180236]
[ 0.27214473 0.5516315 -0.47251672 0.44105405]
[ 0.2637157 0.5352751 -0.46818826 0.44008902]]]
skip_x_attention:[[[ 1.2629684 0.5438655 0.523044 1.4318024 ]
[ 0.27214473 1.5516315 0.5274833 1.4410541 ]
[ 1.2637157 0.5352751 -0.46818826 1.440089 ]]]

I’m stuck here, and surprised because this step seemed straight forward:

Now add a skip connection by adding your original input x and the output of the your multi-head attention layer.

We’re just talking about adding x and self_mha_output aren’t we?

(Also I’ve tried both + and tf.keras.layers.Add() just in case, but it didn’t seem to make a difference.)

Thank you

Hint:

(x + self_mha_output)

And you pass it through one of the self.layernorm?(…) methods.

2 Likes

Thank you for your help!

I misinterpreted the instruction to not do that until the next step, which then led me to cascading confusion.

1 Like