W4_Ex-4 | (Encoder) - Stuck with no clue what to do next! Please help!

Hi there -

I am completely stuck at the Course5 Week4 Exercise 4 (UNQ_C4) step. I have spent a lot of time reading through Keras documentation, going through the function step by step, trying several combinations based on the instructions provided etc. but I have no idea how to get the call() function to work.

At this point, I am completely frustrated and am close to calling it quits. I’ve managed to work through all prior assignments from Course 1 through Course 5 week3 on my own (after a bit of struggling in some cases), but I feel like I’ve hit an impenetrable wall this time around.

I don’t want to post my code as it likely goes against the Honor Code, but I think my issue is with the initial call to the self.mha() layer


    # calculate self-attention using mha(~1 line)
    attn_output = self.mha()  # Self attention (batch_size, input_seq_len, fully_connected_dim)

Can anyone point me in the right direction?



Lots of students have difficulty with this assignment - it isn’t very well-written.

Have you tried searching on the forum here for posts from other students? There has been a lot of discussion about it.

/// Update 11/2022 ///
For self attention, you call self.mha(…) with x for all three K, Q, and V arguments, and also pass the mask. This is discussed in Sections 3, 4, and 4.1.

Thanks @TMosh! I tried looking through some of the existing threads but couldn’t find anything that was immediately helpful to get me unstuck.
I can give it another go (after my brain decompresses a bit), but am not sure how this will help! Would appreciate other suggestions!


@santoshsastry I’m stuck in there too. This assignment is by far the most challenging… and it lacks some of the typical hints in the code to help a little more
So far it helped me a lot reading the Transformer documentation.

@kleber - yeah, it is challenging and the notebook documentation is not very helpful. I had to spend a ton of time reviewing the TF/keras documentation and looking through examples to understand how to proceed further. I managed to complete the course successfully, but it took a lot more effort and time than I had anticipated.

Best of luck!

I also managed to complete it. But I must say this last Week’s topic is incredibly complex… Reading the Transformer/Encoder/Decoder documentation is absolutely essential for this assignment. I think it could be better designed so the student could absorb the concepts more concretely. I’ll think about a good feedback on it.

I am also stuck here. Can anyone of you who completed help me out…

You can try searching documents on tensorflow documents: Text  |  TensorFlow

Hi @TMosh
I got this error and I am stuck here for a long time. I read a blog suggests to use mask in the attention layer but I don’t know how, Could u please help me?

In EncoderLayer(), the only use of “mask” is with self.mha().
The only use of “training=training” is with self.dropout_ffn().


Thank u so much!! :heart:
I solved this problem

Didn’t solve the problem for me. Variable names are awful. Whoever wrote this needs to look at quality of other programming exercises and learn


Your answer saved my day. But why is the usage from mask different from the usage of training, as a parameter?
I don’t want to post the code here, but this is very strange for me.

The masking is discussed in Sections 2.1 and 2.2.