Why do we need positional encoding instead of representing word position with index values like 0, 1, 2, 3ā¦?
The result of embedding + positional encoding would become large in this case.
Scale of inputs is important for faster convergence.
Positional encoding is a concept peculiar to Transformer architecture, unlike traditional recurrent models like RNN, Transformer do not have inherent notions of the order of tokens in a sequence. To address this, positional encodings are added to the token embeddings. This provides information about the position of tokens within the sequence.
I did some more research, so Iām guessing Transformer inherently does not understand order is because they process all tokens at once, making the representation of sequence through index useless?
Hello @evilyoda
Word Embeddings is a technique where individual words are represented as real-valued vectors in a lower-dimensional space and captures inter-word semantics. Each word is represented by a real-valued vector with tens or hundreds of dimensions, i.e. every word is given a one hot encoding which then functions as an index, and the corresponding to this index is a n dimensional vector where the coefficients are learn when training the model.
Positional encoding involves using a d dimensional vector to represent a specific position in a sentence. This vector is not a part of the model itself but is instead added to the input data to provide the model with information about the position of each word in the sentence.
Nothing is useless, while word position with index values helps you determine its relation between words, positional encoding helps you determine position of the word in a given sequence of words.
Regards
DP
Yes my point exactly!!!