Encoder in Transformers

Hey,

In the Encoder class, we input maximum_position_encoding to get self.pos_encoding. In the Call method, we use x += self.pos_encoding[:, :seq_len, :]. Why do we need the maximum_position_encoding if we only use seq_len?

Hi @shaya_kahn

During the call method, only the relevant part of these encodings up to seq_len is added to the input x. So by doing this we avoid re-computing the positional encodings for each input sequence and the code is also cleaner this way.

Hope it helps! Feel free to ask if you need further assistance.

2 Likes

maximum position encoding in self.pos_encoding ensures the order of tokens in a given vocab size being recalled in self.pos_encoding and used in passing to self.pos_encoding with seq_len on the given parts of token(input) while encoding.