The purpose of embedding is to convert words into numbers. The key to solving any NLP problem requires precise embedding, but during Transformer I got familiar with concept positional encoding.
And this layer changes each feature by some level. Then how model still able to predict good results?
Suppose I have a single word in a sentence with a 6-dimension vector representation.
Embedding layer output:
[0.4045374, 0.0817129, -0.3558856, 0.1497059, -0.6311387, 0.3135473]
After adding positional encoding layer:
[1.2460084, 0.62201524, -0.3094864, 1.148629, -0.62898433, 1.313545]
We can see that some features have deviated highly and some aren’t. For me, this doesn’t seem right!
For e.g. training sentence contains words King and Queen. During training they would have assigned vectors that are close to each other because of their meaning. But positional encoding can deviate these numbers!!!
[Note: I am not asking how positional-encoding works but rather why positional-encoding works?]