Right now I’m working on the encoding class exercise (exercise 5). I’m getting a shape error for the part where you add positional encoding to the embedding. Right now my initial pass through the embedding looks like:
x = self.embedding(x)
and my scaling looks like:
x *= tf.math.sqrt(tf.cast(self.embedding_dim, dtype=‘float32’))
Have I done these steps wrong, or do I need to incorporate the mask somewhere in them? The role that the mask and the seq_len variable play is unclear to me. Thanks for the help!
I am very dumb. Please ignore me
I think you found a ListWrapper, which is the list of N x EncoderLayer, and passed a mask for those. That’s good.
The role that the mask and the seq_len variable play is unclear to me.
In this exercise, relationship among several variables are slightly unclear. And, there are some error to describe dimensions which uses “fully_connected_dim” wrongly instead of “embedding_dim”. (Note that this is just for comments, not code.) I believe Deeplearning.ai team has been working on brushing up.
Here is an overview of Transformer encoder.
As you found already, a mask is used in a multi-head-attention. To be more exact, it is used in a scaled-dot-product attention before applying Softmax.
seq-len is a fixed length. If a word count of an actual sentence is smaller than that, we use padding to fill up. But, for calculating Softmax probability distribution, “padding” should be ignored. So, a padding-mask is applied before Softmax.
I suppose you already figure out, but this is for other learners who may come to here with ‘search’.