Week 4 Positional Encoding

evilyoda · April 9, 2024, 8:54am

Why do we need positional encoding instead of representing word position with index values like 0, 1, 2, 3…?

balaji.ambresh · April 9, 2024, 9:27am

The result of embedding + positional encoding would become large in this case.
Scale of inputs is important for faster convergence.

ridwan.ibidunni · April 9, 2024, 5:12pm

Positional encoding is a concept peculiar to Transformer architecture, unlike traditional recurrent models like RNN, Transformer do not have inherent notions of the order of tokens in a sequence. To address this, positional encodings are added to the token embeddings. This provides information about the position of tokens within the sequence.

evilyoda · April 10, 2024, 6:20am

I did some more research, so I’m guessing Transformer inherently does not understand order is because they process all tokens at once, making the representation of sequence through index useless?

Deepti_Prasad · April 10, 2024, 11:50am

Hello @evilyoda

Word Embeddings is a technique where individual words are represented as real-valued vectors in a lower-dimensional space and captures inter-word semantics. Each word is represented by a real-valued vector with tens or hundreds of dimensions, i.e. every word is given a one hot encoding which then functions as an index, and the corresponding to this index is a n dimensional vector where the coefficients are learn when training the model.

Positional encoding involves using a d dimensional vector to represent a specific position in a sentence. This vector is not a part of the model itself but is instead added to the input data to provide the model with information about the position of each word in the sentence.

Nothing is useless, while word position with index values helps you determine its relation between words, positional encoding helps you determine position of the word in a given sequence of words.

Regards
DP

ridwan.ibidunni · April 18, 2024, 8:44am

Yes my point exactly!!!

Topic		Replies	Views
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models	3	550	November 16, 2022
Help! I still don't understand how transformer works! Sequence Models	3	477	August 4, 2023
Position encoding: Time series Sequence Models	2	566	May 2, 2022
Doesn't positional encoding create noise in embedding(features) of word? NLP with Attention Models week-2	1	564	September 24, 2022
Can positional encoding be meaningfully generalized to non-integer position? Sequence Models	2	774	July 18, 2021

Week 4 Positional Encoding

Related topics