As noted elsewhere on the forum, the Transformer lab is a bit of a train wreck, with insufficient explanation in the lectures, and insufficient number of suggestions in the assignment. While the course staff is working on updating Week 5 content, some of us would like to finish it nevertheless, so we can move on to other courses. The purpose of this post is to give others a fighting chance. Before I go any further - don’t despair if you’re getting stuck right away. The first few exercises are the most difficult ones if your only exposure to TensorFlow has been through this course. Things get a lot easier further down, and you should be able to complete the last few exercises with almost no hints. Ok, here goes.
For explanation of transformers, check out this blog post. It is not just thorough, it has fantastic references. If you follow its links, you can go as far back as your knowledge gaps take you, all the way to multi-dimensional differentials, if need be. If you only have limited time, watch the videos linked at the top: Transformers from scratch | peterbloem.nl
First exercise. The word dimension comes up a lot. If you’re going from one-hot to vectorization/embedding it’s the number of dimensions/output nodes you are going to have in the embedding. That number, d in the first function, is also the dimension of key and query, for reasons explained in the link above. Don’t forget to do the obligatory casting of integer d to float32.
Second exercise. There are some step functions “::” that were not introduced before and some transformations with np.arrange to get the shapes right. If you’re still stuck, check out the ungraded lab, it has the expected code.
Third exercise. I suggest you still stumble through all the steps and do some debugging before looking at the link. Off the bat, I had about 80% of the code right, and managed to get up to 90 on my own (not sure how long I took with debugging, but probably less than 1hr). It’s helpful to see where I went wrong. The code for the exercise is largely the same as in TensorFlow Transformer tutorial: Modelo de transformador para compreensão da linguagem | Text | TensorFlow
Fourth exercise. This one is actually not that hard, and things only get easier from there. At this point, I suggest you resist the temptation to follow the tutorial above. Also, this is the first time we’re seeing objects in Python, so you will need to remember to address all he object’s variables by prefixing them with self, ie “self.mha(…)”. Self is also the first argument to the function definitions on that object, but you don’t pass it when calling the function on the object, so a function “call (self, args)” defined on class “SomeLayer” gets called “someLayer(args)”. At this point, the sticking point for me was what to pass to the MultiHeadAttention layer as k, q, and v parameters. You need to pass input x in all 3 positions, the code figures it out from there. The fourth parameter is mask, you will actually get a meaningful error message if you forget it.
Fifth - Eight exercises. Should be able to do them without additional hints.
Good luck!