Hi Russel, thanks for the good tip.
I corrected the mask, but still have the following error, any hints?
I made a mistake with dk.
Welcome to the community Liuzifeng!
Good to hear that you found the error on your own.
We scale the vector to make it easy for a model to learn and understand the query. Here, in this assignment, we are re-scaling the vector by dividing them with sqrt of đđ, like in the formula below:
dk, here, emphasizes the dimension of the keys, which is used to scale everything down so the softmax doesnât explode.
Scaling is nothing, but a step to normalize the data within a particular range, as most of the times, collected data set comprises of features that vary highly in terms of its magnitude, units and range.
Thanks!!! It also helped me!!!
I agree with some of the comments above that this exercise is unclear compared to other programming exercises in this specialization. I am pushing my way through, but it requires more trial and error than I would like in order to determine the intent of the code comments.
One hint that confused me shows a more general issue âapply layer normalization on sum of the input and the attention outputâ. In past exercises, we were composing tensorflow layers, effectively treating inputs as symbolic values. I could not figure out how it made sense just to sum up the inputs and what I was expected to do here instead.
As I now understand it (and please correct me if Iâm wrong), weâre actually overriding the internals of a new layer subclass (EncoderLayer) and thatâs why we have access to the actual tensor values and can add them directly.
Itâs possible that this is discussed and I missed it, but this exercise might be improved with a very visible (bold or italics) note that weâre doing a different style of implementation here. Alternatively, maybe figuring this out is part of the exercise, but it is not really covered in the rest of the specialization.
Cell #13. Canât compile the studentâs code. Error: AssertionError(âWeights must be a tensorâ)
passed all the tests, but after submit it showed âCell #13. Canât compile the studentâs code. Error: AssertionError(âWeights must be a tensorâ)â, so I searched and refered to many others post ,but can not solve it !!! Crazy!! What I should do ??? Anyone help?ďźThanks a lot!!!
Hello @_Kelly,
Letâs look at the error message together, because it has a lot of information.
First, we want to locate the cell number 13. It is the 13th CODE CELL counting from the top of your notebook. Then we should find this to be the cell:
Now, we know the error was triggered when it was testing your scaled_dot_product_attention
exercise to the public tests. You should have encountered this error if you ran through all the code in the notebook. I suggest you to do this before making a submission.
The error message said that the Weights are not Tensor. Now your scaled_dot_product_attention
is supposed to return two objects: output
, and attention_weights
, so the problem should be in the second object.
Here are my suggestions:
- add a
print
to check out the type of theattention_weights
, and make sure it is not a Tensor to verify that you have located the problem correctly. - make sure you have ONLY used tensorflow methods to work through all variables that finally created
attention_weights
. Since the input variables can be numpy arrays, you will need to usedtf.matmul
instead of@
to make sure it is a tensorflow matrix multiplication. You should be able to complete this exercise without any numpy methods.
Good luck!
Raymond
It seems absolutely nuts to me that I am reading so many comments from two years ago about how the object-oriented programming in this lesson is too far afield of everything that has been done before, how it is written in a way where students need to do much more work to figure out the precise syntax, how there are not enough templates in the coding to help people figure it out themselvesâŚand yet nothing has changed. I see references to some âhour long meeting with 15 peopleâ in multiple threads, but again, that meeting occurred over two years ago. The assignment, as far as I can tell, has not changed much. Itâs still very opaque.
Iâm currently on Exercise 6, the DecoderLayer, and Iâm getting an error of âWrong values in outâ. Which values, why, where is the calculation going wrong? No idea, no feedback, itâs just wrong somewhere. At least when I was getting errors in previous exercises, they tended to be things like I had the syntax wrong, so I could iterate and experiment and eventually fix it (seriously, do you realize how difficult it is to divine the syntax for the loop in the Encoder class based on the course material? Without closely reading the implementation in the tensorflow.org tutorial, it would have been impossible). Now though, I have nothing. I feel like use_causal_mask probably needs to go somewhere involving mha1 (again, based on looking at the tensorflow tutorial), but their implementation is different, it doesnât look like itâs working when placed with the other call arguments, and Iâve been trained by this course not to touch code outside of the âNoneâ blocks, so Iâm leery of re-writing class structures on the off chance I can make it work, not to mention I might mess up the automatic grader.
In short, this is beyond frustrating. I feel like this went from a course where the structures were helping me learn, to one where the structures are actively impeding my learning because of the combined opaqueness of the code snippets (making it so I must construct the answer in a specific way, without giving enough information for me to figure out what that way is) and the check functions.
Sorry to read your frustration Doug_D. Maybe sharing the output of the error would give the community a better clue on how to help here.
Editing yourself the cells and changing testing code might not be a good idea as you already realized, because of the automatic grader, but adding âprintsâ is always helpful to get more information about what is going on.
Best wishes,
Rosa
Oh, I was adding prints as well, that basically just told me where the error was occurring though.
Anyway, I solved the problem and finished the assignment (and specialization!). Some tips for anyone else stuck on this assignment:
- The only time you should have an â=â in the argument calls for a layer is in the dropout layer, where the comments specify that you should use âtraining=trainingâ. Everywhere else, just pass the variable in the correct order (according to the documentation). Donât try to specify âquery = âŚ, value = âŚâ. Even if you do them in the correct order, it will work, until it doesnât.
- Pass query, value, and key every time. The MultiHeadAttention documentation says if you donât pass key, itâll default to value. Donât rely on that. Just put all three arguments in every time you use an mha() call.
- The only time training should be an argument is for dropout layers, or higher-level layers that encompass a dropout layer (like looping through encoding layers). Keep it out of the arguments for all other low-level layers. The âDropout will be applied during trainingâ comment regarding the mha layer might make you think that you need to include it in that layerâŚbut no, it works automatically, somehow.
Glad that you solved it
Many thanks for sharing your tips. These are very useful for other learners.
Congratulations on finishing the Specialization
Best,
Rosa
Unfortunately, this is still true in August 2023.
More than two years have passed, and I confirm the assignment is still a disasterâŚ
I have finished all the problems in C5W4 programming assignment but I am unable to submit as the grader is showing a error. ( Platform Error: Weâre experiencing some technical difficulties). I have tried with different browsers, clearing cache and cookies of the browser and redoing the entire assignment but still no luck.
It has been reported to the staff.
Hi,
I am not able to submit my C5W4 Transformers Architecture with TensorFlow assignment. I had purchased the course on JAN-29-2024 and had subscription till today.
I completed the quiz and videos and had completed all the eight code cells but when I tried to submit for graded, it locked me out. Now I see a âUpgrade to Submitâ button.
Please help.
Please contact Coursera staff as described here.
Here it is July 2024, I absolutely agree with all the previous comment dating back 3 years about the excessive difficulty and confusion surrounding this lab, and it still hasnât been addressed by the maintainers of this course. This lab assignment is virtually impossible to complete solely from the information given in the lectures or the explanations in the notebook. The tutorial on tensorflow.org is the only way to start to grasp what the hey is going on in this lab and how to complete it and though I have successfully managed the first six exercises, with a fair bit of trial and error, I still feel quite clued out. There needs to be more explanation of how all this code fits together. I feel like I needed to have completed a somewhat advanced course in python OOP and a thorough course in Tensorflow before really understanding what is going on here. All other assignments in this specialization have been fine, if at times fairly challenging. This one is just over the top. It could be fixed with more explanation in the notebook I think but this hasnât happened yet despite the numerous comments.
I understand the issue you are discussing. I do not speak for DLAI, Iâm just a community volunteer.
What has changed in the last three years is that there were two additional ungraded labs added to this week of the course, to provide additional examples. They are listed in the syllabus just after the lab assignment.
I am having trouble with my class EncoderLayer in this assignment. I canât see where I have made a mistake ,I think I have followed all the instructions correctly, I have added print statements and my outputs from each layer seem to have the correct dimensions. The only error I get is âWrong values when training=Trueâ , there is no other indication of where the issue is.