C5W4: Transformer Architectures with TensorFlow

Hi Russel, thanks for the good tip.
I corrected the mask, but still have the following error, any hints?

1 Like

I made a mistake with dk.

2 Likes

Welcome to the community Liuzifeng!

Good to hear that you found the error on your own.

We scale the vector to make it easy for a model to learn and understand the query. Here, in this assignment, we are re-scaling the vector by dividing them with sqrt of 𝑑𝑘, like in the formula below:
Capture3

dk, here, emphasizes the dimension of the keys, which is used to scale everything down so the softmax doesn’t explode.

Scaling is nothing, but a step to normalize the data within a particular range, as most of the times, collected data set comprises of features that vary highly in terms of its magnitude, units and range.

1 Like

Thanks!!! It also helped me!!!

1 Like

I agree with some of the comments above that this exercise is unclear compared to other programming exercises in this specialization. I am pushing my way through, but it requires more trial and error than I would like in order to determine the intent of the code comments.

One hint that confused me shows a more general issue “apply layer normalization on sum of the input and the attention output”. In past exercises, we were composing tensorflow layers, effectively treating inputs as symbolic values. I could not figure out how it made sense just to sum up the inputs and what I was expected to do here instead.

As I now understand it (and please correct me if I’m wrong), we’re actually overriding the internals of a new layer subclass (EncoderLayer) and that’s why we have access to the actual tensor values and can add them directly.

It’s possible that this is discussed and I missed it, but this exercise might be improved with a very visible (bold or italics) note that we’re doing a different style of implementation here. Alternatively, maybe figuring this out is part of the exercise, but it is not really covered in the rest of the specialization.

1 Like

Cell #13. Can’t compile the student’s code. Error: AssertionError(‘Weights must be a tensor’)

passed all the tests, but after submit it showed “Cell #13. Can’t compile the student’s code. Error: AssertionError(‘Weights must be a tensor’)”, so I searched and refered to many others post ,but can not solve it !!! Crazy!! What I should do ??? Anyone help??Thanks a lot!!!

1 Like

Hello @_Kelly,

Let’s look at the error message together, because it has a lot of information.

First, we want to locate the cell number 13. It is the 13th CODE CELL counting from the top of your notebook. Then we should find this to be the cell:

image

Now, we know the error was triggered when it was testing your scaled_dot_product_attention exercise to the public tests. You should have encountered this error if you ran through all the code in the notebook. I suggest you to do this before making a submission.

The error message said that the Weights are not Tensor. Now your scaled_dot_product_attention is supposed to return two objects: output, and attention_weights, so the problem should be in the second object.

Here are my suggestions:

  1. add a print to check out the type of the attention_weights, and make sure it is not a Tensor to verify that you have located the problem correctly.
  2. make sure you have ONLY used tensorflow methods to work through all variables that finally created attention_weights. Since the input variables can be numpy arrays, you will need to used tf.matmul instead of @ to make sure it is a tensorflow matrix multiplication. You should be able to complete this exercise without any numpy methods.

Good luck!
Raymond

2 Likes

It seems absolutely nuts to me that I am reading so many comments from two years ago about how the object-oriented programming in this lesson is too far afield of everything that has been done before, how it is written in a way where students need to do much more work to figure out the precise syntax, how there are not enough templates in the coding to help people figure it out themselves…and yet nothing has changed. I see references to some “hour long meeting with 15 people” in multiple threads, but again, that meeting occurred over two years ago. The assignment, as far as I can tell, has not changed much. It’s still very opaque.

I’m currently on Exercise 6, the DecoderLayer, and I’m getting an error of “Wrong values in out”. Which values, why, where is the calculation going wrong? No idea, no feedback, it’s just wrong somewhere. At least when I was getting errors in previous exercises, they tended to be things like I had the syntax wrong, so I could iterate and experiment and eventually fix it (seriously, do you realize how difficult it is to divine the syntax for the loop in the Encoder class based on the course material? Without closely reading the implementation in the tensorflow.org tutorial, it would have been impossible). Now though, I have nothing. I feel like use_causal_mask probably needs to go somewhere involving mha1 (again, based on looking at the tensorflow tutorial), but their implementation is different, it doesn’t look like it’s working when placed with the other call arguments, and I’ve been trained by this course not to touch code outside of the “None” blocks, so I’m leery of re-writing class structures on the off chance I can make it work, not to mention I might mess up the automatic grader.

In short, this is beyond frustrating. I feel like this went from a course where the structures were helping me learn, to one where the structures are actively impeding my learning because of the combined opaqueness of the code snippets (making it so I must construct the answer in a specific way, without giving enough information for me to figure out what that way is) and the check functions.

3 Likes

Sorry to read your frustration Doug_D. Maybe sharing the output of the error would give the community a better clue on how to help here.
Editing yourself the cells and changing testing code might not be a good idea as you already realized, because of the automatic grader, but adding “prints” is always helpful to get more information about what is going on.
Best wishes,

Rosa

2 Likes

Oh, I was adding prints as well, that basically just told me where the error was occurring though.

Anyway, I solved the problem and finished the assignment (and specialization!). Some tips for anyone else stuck on this assignment:

  • The only time you should have an “=” in the argument calls for a layer is in the dropout layer, where the comments specify that you should use “training=training”. Everywhere else, just pass the variable in the correct order (according to the documentation). Don’t try to specify “query = …, value = …”. Even if you do them in the correct order, it will work, until it doesn’t.
  • Pass query, value, and key every time. The MultiHeadAttention documentation says if you don’t pass key, it’ll default to value. Don’t rely on that. Just put all three arguments in every time you use an mha() call.
  • The only time training should be an argument is for dropout layers, or higher-level layers that encompass a dropout layer (like looping through encoding layers). Keep it out of the arguments for all other low-level layers. The “Dropout will be applied during training” comment regarding the mha layer might make you think that you need to include it in that layer…but no, it works automatically, somehow.
4 Likes

Glad that you solved it :smile:
Many thanks for sharing your tips. These are very useful for other learners.

Congratulations on finishing the Specialization :confetti_ball:
Best,

Rosa

1 Like

Unfortunately, this is still true in August 2023.

4 Likes

More than two years have passed, and I confirm the assignment is still a disaster…

4 Likes

I have finished all the problems in C5W4 programming assignment but I am unable to submit as the grader is showing a error. ( Platform Error: We’re experiencing some technical difficulties). I have tried with different browsers, clearing cache and cookies of the browser and redoing the entire assignment but still no luck.

2 Likes

It has been reported to the staff.

1 Like

Hi,

I am not able to submit my C5W4 Transformers Architecture with TensorFlow assignment. I had purchased the course on JAN-29-2024 and had subscription till today.
I completed the quiz and videos and had completed all the eight code cells but when I tried to submit for graded, it locked me out. Now I see a ‘Upgrade to Submit’ button.

Please help.

Please contact Coursera staff as described here.

Here it is July 2024, I absolutely agree with all the previous comment dating back 3 years about the excessive difficulty and confusion surrounding this lab, and it still hasn’t been addressed by the maintainers of this course. This lab assignment is virtually impossible to complete solely from the information given in the lectures or the explanations in the notebook. The tutorial on tensorflow.org is the only way to start to grasp what the hey is going on in this lab and how to complete it and though I have successfully managed the first six exercises, with a fair bit of trial and error, I still feel quite clued out. There needs to be more explanation of how all this code fits together. I feel like I needed to have completed a somewhat advanced course in python OOP and a thorough course in Tensorflow before really understanding what is going on here. All other assignments in this specialization have been fine, if at times fairly challenging. This one is just over the top. It could be fixed with more explanation in the notebook I think but this hasn’t happened yet despite the numerous comments.

1 Like

I understand the issue you are discussing. I do not speak for DLAI, I’m just a community volunteer.

What has changed in the last three years is that there were two additional ungraded labs added to this week of the course, to provide additional examples. They are listed in the syllabus just after the lab assignment.

1 Like

I am having trouble with my class EncoderLayer in this assignment. I can’t see where I have made a mistake ,I think I have followed all the instructions correctly, I have added print statements and my outputs from each layer seem to have the correct dimensions. The only error I get is ‘Wrong values when training=True’ , there is no other indication of where the issue is.