I am completely stuck at the Course5 Week4 Exercise 4 (UNQ_C4) step. I have spent a lot of time reading through Keras documentation, going through the function step by step, trying several combinations based on the instructions provided etc. but I have no idea how to get the call() function to work.
At this point, I am completely frustrated and am close to calling it quits. I’ve managed to work through all prior assignments from Course 1 through Course 5 week3 on my own (after a bit of struggling in some cases), but I feel like I’ve hit an impenetrable wall this time around.
I don’t want to post my code as it likely goes against the Honor Code, but I think my issue is with the initial call to the self.mha() layer
Lots of students have difficulty with this assignment - it isn’t very well-written.
Have you tried searching on the forum here for posts from other students? There has been a lot of discussion about it.
/// Update 11/2022 ///
For self attention, you call self.mha(…) with x for all three K, Q, and V arguments, and also pass the mask. This is discussed in Sections 3, 4, and 4.1.
Thanks @TMosh! I tried looking through some of the existing threads but couldn’t find anything that was immediately helpful to get me unstuck.
I can give it another go (after my brain decompresses a bit), but am not sure how this will help! Would appreciate other suggestions!
@santoshsastry I’m stuck in there too. This assignment is by far the most challenging… and it lacks some of the typical hints in the code to help a little more
So far it helped me a lot reading the Transformer documentation.
@kleber - yeah, it is challenging and the notebook documentation is not very helpful. I had to spend a ton of time reviewing the TF/keras documentation and looking through examples to understand how to proceed further. I managed to complete the course successfully, but it took a lot more effort and time than I had anticipated.
I also managed to complete it. But I must say this last Week’s topic is incredibly complex… Reading the Transformer/Encoder/Decoder documentation is absolutely essential for this assignment. I think it could be better designed so the student could absorb the concepts more concretely. I’ll think about a good feedback on it.
Hi @TMosh
I got this error and I am stuck here for a long time. I read a blog suggests to use mask in the attention layer but I don’t know how, Could u please help me?
Your answer saved my day. But why is the usage from mask different from the usage of training, as a parameter?
I don’t want to post the code here, but this is very strange for me.