C5W4: Transformer Architectures with TensorFlow

The last assignment of course 5 week 4 was really a disaster. Nothing is understood, it is not well explained. It is much more difficult than the previous tasks. I ended up copying from the trasnformers tutorial from the tensor flow page (without being capable of understanding ). PLEASE IMPROVE IT!


Hi Falonso,

Sorry to read your frustration. The Transformers topic is indeed complex but at the same time very interesting and a hot field right now at the NN community.

Congrats on having finished the assignment even with the difficulties you are mentioning. I believe the Coursera team is trying to improve every version of the notebook thanks to the feedback they receive, so I encourage you to pass a specific improvement note to them, saying for instance the number of the exercises that you consider not well explained and which are the useful tips they should include. Whichever worked for you, can work for other students as well, so your opinion is valuable. Try to send the feedback from the help center by email.

Happy learning,



I feel better after reading this. That it is indeed a bit complex. But rightly added in the end of the full specialization :slight_smile:


To add a little to Arosa’s comment.

You are not the first to criticize this assignment! A few weeks ago there was a Zoom meeting between the course staff and mentors that was largely initiated by that criticism. Some of us provided detailed feedback to the course staff about our personal views of the assignment (and week 4 lectures). The staff are listening and and are discussing the issues with the instructors.

In my personal comments I also used phrases similar to your “nothing understood”, so I know how you feel. If there are specifics about what you don’t understand, please share them so that the staff and instructors can get even more views about how it could be improved.


I pretty much agree with this sentiment. Why did it switch to object-oriented programming? I was still learning to use tensorflow layers.


Yes the last assignment is really a disaster.

1 Like

Dear @arosacastillo ,
I have passed all the test, but when I summit, I can not pass the assignment. The message is like this: Cell #18. Can’t compile the student’s code. Error: AssertionError(‘Wrong values case 1’)
Could you please help me to solve the problem?

1 Like

Hi Mrtranducdung,

Please have a read to the solutions proposed here:

It is always a good practice to do a search in the forum if people have posted previously similar errors. I found many solutions on my own as a student this way :slight_smile:

Happy learning


1 Like

Hi Dheeraj here. Really feeling frustated. Completed the whole course in full flow. But in Transformer part i am unable to submit the Answers. And it grader show me the 0/100 so please help me to clear the Assessment.


Same frustration here. I finished all other homework by myself, except this one.

1 Like

I am stuck in scaled_dot_product_attention_test with an exception: AssertionError: Wrong masked weights

No idea how to proceed.


I got the same error message.
I found out that I forgot to subtract mask from 1 before multiplying it.
When I had it like this:
scaled_attention_logits += (1-mask) * -1.0e9
… things worked out. I hope this was helpful!


Yes it works thanks!.

1 Like

Thanks, that was helpful and I finished the whole assignment after that.


I understand conceptually what the Q,K and V matrices are, however at this point in the encoder code:

# calculate self-attention using mha(~1 line). Dropout will be applied during training

The call requires the matrices q, k and v as arguments, but it’s not clear where those matrices come from. Since the comment says “~1 line”, I would expect them to be readily available, but where are they? Honestly, this is not a course on TensorFlow/Keras, there are other courses for those topics. Although I have consulted TF’s docs several times in this specialization, I don’t think I should be required to dig deep into TF to finish the assignments. I should just use it as an aid.
So I would appreciate if someone could please tell me what the heck do those three matrices come from, so I can pass them to the mha() call. Again, this is not a conceptual question, It’s more about the “technicalities” of the implementation, which I should not be required to know.

Thanks a lot!


Likewise. Thanks for the help!

1 Like

a mha object has already been instantiated within the EncoderLayer class, in “def init(…)”, which already passes the q, k and v matrices.

Within the the call() method, you must call mha using self.mha(), and pass in the input “x” I believe, as such : self.mha(x)

1 Like

yes, real this assignment is way out of the scope just start to learn TensorFlow and then add the concept of OOP in that and the video also did not help that much in the assignment


Hi everyone, does anyone meet this problem as below:
Cell #16. Can’t compile the student’s code. Error: AssertionError(‘Wrong type. Output must be a tensor’)

This happens at the last cell in Exercise 3 - scaled_dot_product_attention.
But I aslo find that the
NameError: name ‘scaled_dot_product_attention_test’ is not defined

Could anyone help to solve this problem?
Thanks a lot!


scaled_dot_product_attention_test is defined in the file public_tests.py. It’s possible that you didn’t run a cell / accidentally deleted the import.

Here’s the cell to run:

from public_tests import *


# Example
position = 4
d_model = 8
pos_m = np.arange(position)[:, np.newaxis]
dims = np.arange(d_model)[np.newaxis, :]
get_angles(pos_m, dims, d_model)

If the import was missing from the start, please refresh your workspace and try again.
See Refresh your Lab Workspace section here