It took me awhile to grok and finish the EncoderLayer class… super interesting stuff that challenged my brain - so yay. What steered me wrong initially - and I still do not understand is why every line of code we are expected to fill in ourselves in that class implementation ends with the comment:
(batch_size, input_seq_len, embedding_dim)
…when it has no contextual value that I can see