Encoder in Transformers


In the Encoder class, we input maximum_position_encoding to get self.pos_encoding. In the Call method, we use x += self.pos_encoding[:, :seq_len, :]. Why do we need the maximum_position_encoding if we only use seq_len?

Hi @shaya_kahn

During the call method, only the relevant part of these encodings up to seq_len is added to the input x. So by doing this we avoid re-computing the positional encodings for each input sequence and the code is also cleaner this way.

Hope it helps! Feel free to ask if you need further assistance.


maximum position encoding in self.pos_encoding ensures the order of tokens in a given vocab size being recalled in self.pos_encoding and used in passing to self.pos_encoding with seq_len on the given parts of token(input) while encoding.