Doesn't positional encoding create noise in embedding(features) of word?

The purpose of embedding is to convert words into numbers. The key to solving any NLP problem requires precise embedding, but during Transformer I got familiar with concept positional encoding.

And this layer changes each feature by some level. Then how model still able to predict good results?
Suppose I have a single word in a sentence with a 6-dimension vector representation.

Embedding layer output:

[0.4045374, 0.0817129, -0.3558856, 0.1497059, -0.6311387, 0.3135473]

After adding positional encoding layer:

[1.2460084, 0.62201524, -0.3094864, 1.148629, -0.62898433, 1.313545]

We can see that some features have deviated highly and some aren’t. For me, this doesn’t seem right!

For e.g. training sentence contains words King and Queen. During training they would have assigned vectors that are close to each other because of their meaning. But positional encoding can deviate these numbers!!!

[Note: I am not asking how positional-encoding works but rather why positional-encoding works?]

1 Like

Hi, @Aayush_Jariwala !

You can see the embedding output and the positional encoding combination as a way of “multiplexing” information. It may seem counterintuitive when you look directly at the numeric vectors but the network can interpret that mixed information and adjust its weights consequently.

Anyway, there are still some papers arguing about the good performance of models that don’t perform this type of encoding.

2 Likes