Encoder in Transformers

shaya_kahn · August 17, 2024, 11:24am

Hey,

In the Encoder class, we input maximum_position_encoding to get self.pos_encoding. In the Call method, we use x += self.pos_encoding[:, :seq_len, :]. Why do we need the maximum_position_encoding if we only use seq_len?

Alireza_Saei · August 17, 2024, 11:41am

Hi @shaya_kahn

During the call method, only the relevant part of these encodings up to seq_len is added to the input x. So by doing this we avoid re-computing the positional encodings for each input sequence and the code is also cleaner this way.

Hope it helps! Feel free to ask if you need further assistance.

Deepti_Prasad · August 17, 2024, 8:58pm

maximum position encoding in self.pos_encoding ensures the order of tokens in a given vocab size being recalled in self.pos_encoding and used in passing to self.pos_encoding with seq_len on the given parts of token(input) while encoding.

Topic		Replies	Views
C5_W4_A1_Transformer_Subclass_v1 UNQ_C5 Encoder_test bug Sequence Models	4	672	November 11, 2021
What does the "maximum number of positions to be encoded" mean? Sequence Models	7	561	March 11, 2023
C5_W4_A1 Exercise 5 Sequence Models	1	567	November 30, 2022
Week 4 Positional Encoding Sequence Models week-4	5	272	April 18, 2024
Why is Positional Encoding added to the input, instead of being concatenated to it? Sequence Models	1	499	October 3, 2022

Encoder in Transformers

Related topics