C5_W4_A1 DON'T PANIC! Transformer help is on the way

pslusarz · May 23, 2022, 1:46am

As noted elsewhere on the forum, the Transformer lab is a bit of a train wreck, with insufficient explanation in the lectures, and insufficient number of suggestions in the assignment. While the course staff is working on updating Week 5 content, some of us would like to finish it nevertheless, so we can move on to other courses. The purpose of this post is to give others a fighting chance. Before I go any further - don’t despair if you’re getting stuck right away. The first few exercises are the most difficult ones if your only exposure to TensorFlow has been through this course. Things get a lot easier further down, and you should be able to complete the last few exercises with almost no hints. Ok, here goes.

For explanation of transformers, check out this blog post. It is not just thorough, it has fantastic references. If you follow its links, you can go as far back as your knowledge gaps take you, all the way to multi-dimensional differentials, if need be. If you only have limited time, watch the videos linked at the top: Transformers from scratch | peterbloem.nl

First exercise. The word dimension comes up a lot. If you’re going from one-hot to vectorization/embedding it’s the number of dimensions/output nodes you are going to have in the embedding. That number, d in the first function, is also the dimension of key and query, for reasons explained in the link above. Don’t forget to do the obligatory casting of integer d to float32.

Second exercise. There are some step functions “::” that were not introduced before and some transformations with np.arrange to get the shapes right. If you’re still stuck, check out the ungraded lab, it has the expected code.

Third exercise. I suggest you still stumble through all the steps and do some debugging before looking at the link. Off the bat, I had about 80% of the code right, and managed to get up to 90 on my own (not sure how long I took with debugging, but probably less than 1hr). It’s helpful to see where I went wrong. The code for the exercise is largely the same as in TensorFlow Transformer tutorial: Modelo de transformador para compreensão da linguagem | Text | TensorFlow

Fourth exercise. This one is actually not that hard, and things only get easier from there. At this point, I suggest you resist the temptation to follow the tutorial above. Also, this is the first time we’re seeing objects in Python, so you will need to remember to address all he object’s variables by prefixing them with self, ie “self.mha(…)”. Self is also the first argument to the function definitions on that object, but you don’t pass it when calling the function on the object, so a function “call (self, args)” defined on class “SomeLayer” gets called “someLayer(args)”. At this point, the sticking point for me was what to pass to the MultiHeadAttention layer as k, q, and v parameters. You need to pass input x in all 3 positions, the code figures it out from there. The fourth parameter is mask, you will actually get a meaningful error message if you forget it.

Fifth - Eight exercises. Should be able to do them without additional hints.

Good luck!

TMosh · May 23, 2022, 2:32am

Thanks for your suggestions.

Deepti_Prasad · August 17, 2024, 6:12pm

Thank you @sushantnair for adding your perspective but the reason we tell learners to always create a new topic is because in case the post creator learner plans to leave this platform any further comment trails are also lost and sometimes it can confuse other learners.

sushantnair · August 17, 2024, 6:20pm

Yes I understand. But I also felt that better the context be maintained. Anyway I believe it’s a case of pros and cons.
But thinking about it, I feel next time onwards I could create a new post and refer to the older post by providing link. Then we could have both advantages.

Deepti_Prasad · August 17, 2024, 6:26pm

you are getting better in creative idea

sushantnair · August 17, 2024, 6:26pm

Note that to follow community guidelines I have created a new post here: A perspective for C5W4A1 EX4

Andrew_Jewett · October 2, 2024, 3:23am

Thank you for reminding me about the mask. I completely forgot about the mask. This was the missing piece for me. This advice was really helpful.

Topic		Replies	Views
C5W4: Transformer Architectures with TensorFlow Sequence Models week-4	40	4500	August 3, 2024
Week4 is awesome Sequence Models	1	503	May 10, 2022
Thanks & a few suggestions on C5W4 Sequence Models	1	594	August 22, 2021
Course 5, Week 4: Transformer - Exercise 5 (Encoder) and 7 (Decoder) Sequence Models	3	419	September 21, 2023
Pedagogy of C5 W4 A1 Transformer (Ex4: Encoder Layer) Sequence Models	43	5192	October 2, 2024

C5_W4_A1 DON'T PANIC! Transformer help is on the way

Related topics