C5_W4_A1_Ex-4_EncoderLayer

Hi!
I have a terrible experience with the last programming assignment (probably, because I am a newbie to both py, tf, and Keras). The descriptions and hints are very unhelpful.
The particular problem I have right now: running the test for the EncoderLayer, I get ‘call() missing 1 required positional argument: ‘value’’ in relation to the attn_output = … line /Where, btw, I put the ‘attention_mask=mask’ after the input by intuition only… - definitely a wrong way to go, but I have no ideas, how to handle this/ Please, help

2 Likes

Hi,
to use the Multi-Head-Attention layer, you have to specify query, value, and key.
In the documentation (tf.keras.layers.MultiHeadAttention  |  TensorFlow Core v2.5.0) you can read: If query , key, value are the same, then this is self-attention.
This is simply to say for a given t: q=Wq.x, k=Wk.x, v=Wv.x, with the same input x

So basically, you have to pass query=x, value=x, and key=x (optional, if missing then key = value by default) to the argument of self.mha

7 Likes

Thank you, Dyxuki! That helped.
Now I am at full Encoder() where I get ‘‘ListWrapper’ object is not callable’ in relation to the call of self.enc_layers(I put here x, training and mask). Any hints? Btw, could you explain why do we have the loop over range(self.num_layers) in both the self.enc_layers and in the main code? Thank you!

well the error says it all :slight_smile:
self.enc_layers is a list, so instead you should take the elements in it.

the self.num_layers is simply to say you have that many EncoderLayer.
namely, you can see: “self.enc_layers” as [ EncoderLayer, EncoderLayer, EncoderLayer, …, EncoderLayer] , where the len is self.num_layers.

2 Likes

Thank you Dyxuki, clear and (now) simple. Now everything works, except I get ‘Wrong values’…fixed… Now struggle with ‘Wrong values in outd’ at Decoder… :slight_smile:

1 Like

You are welcome !
apparently the wrong outd value pb is a bug in the assignment sheet, not yet fixed, I got it also when doing the assignment. The thread is here:
https://community.deeplearning.ai/t/week-4-assignment-1-decoder-class-error

Thanks for the tip. After some struggling, following that thread, managed the course!

Hi Dalkhat -
I am running into an issue with Course5 Week4 Exercise5 Full Encoder (UNQ_C5) step. I am running into some issues with adding the position encoding to the embedding and not quite sure how to resolve the error I get in the log.

  1. Add the position encoding: self.pos_encoding [:, :seq_len, :] to your embedding.

I am passing in x[:, :seq_len, :] to x from the previous step, but this throws an error saying the object is not callable. Can you help point me in the right direction?

Thank you very much.
/santosh

Never mind, I figured out that I should not be passing in the slices for x.

Thanks!

Hi @Dyxuki ,

Given: q=Wq.x, k=Wk.x, v=Wv.x, with the same input x

If we conclude: query=x, value=x, and key=x, then what’s going on about Wq, Wk, Wv ?

Hello,
it’s just the notation that are a little confusing.
the arguments “query”, “value” and “key” should be seen as the input vector to compute them respectively.
so namely, the argument “query” is not “q” (notation from the course), but rather something like:
q (actual) = Wq.“query (input argument)”

Wq Wk Wv are weights of the layer that will be learnt

1 Like

Hi @Dyxuki, I tried to view the “query”, “value” and “key” as 3 different aspects of input x that providing richer representation of x’s features, and not to bind with the notations.

It works well for me.
Thank you for your hint. :wink:

Hi @Dyxuki

I’m also confused about this, how does query, value, key from the call method relate to the variables from the lecture here, for the case where all 3 are equal (self attention) and when they are all different?

Self attention:
capture4

Multi-head attention:

Regarding the weights, how are Wq, Wk, Wv related to W_i^{<Q>}, W_i^{<K>} W_i^{<V>} (multi-head attention) and W^{<Q>}, W^{<K>} W^{<V>} (self attention)from the slide above?

Q K and V are the Query, Key, and Value.

1 Like

since Wq, Wk, Wv are parameters to learn, if we pass query=x, value=x, key=x, does it mean we effectively initialize Wq, Wk, Wv to 1 (instead of initializing randomly as we did in other algorithms) ?

@Damon @Dyxuki

@GordonRobinson
@Kic
@edwardyu
@laacdm

UNQ_C4, How do I apply dropout only during training in 1 line of code? How do I have to use the training variable for this?

Also, Im getting the error, 'The first argument to Layer.call must always be passed." while applying self.layernorm1(tf.math.add(x, atta_output))

1 Like

You can use a single line of code because the constructor for this class has a method - dropout_ffn() - that does exactly what you need.