Pedagogy of C5 W4 A1 Transformer (Ex4: Encoder Layer)

I agree with @kechan. Transformers could be a 6th course on it’s own with more material and expansion on BERT (+ variations) and ViT etc., and other more recently published materials. I would love to attend the extension of the course.

Adding my +1. This course is really confusing. There’s also inconsistencies in how some of the class functions are called.

Example:
in 5.1 we take the values returned by mha into separate variables
in 4.1 the variable is defined for us and is a tuple of the weights and scores which requires indexing in a later line (which is not obvious until you can’t add x to your output).

All in all, I find that this course tries too hard to weave python classes and transformer architecture to poor effect. One fix could be to better document the classes and methods instead of having students guess the arguments, and remove the boilerplate code within the call() method.

After a very thorough and difficult effort I succeeded to complete the 4th week assignment. The first thing I did after that was come to the community to search if I was the problem. Thank you for this post, many of the replies include great answers to help me understand better what I did.

3 Likes

Thank you for this link, Colin!