Doesn't positional encoding create noise in embedding(features) of word?

Aayush_Jariwala · September 24, 2022, 5:45am

The purpose of embedding is to convert words into numbers. The key to solving any NLP problem requires precise embedding, but during Transformer I got familiar with concept positional encoding.

And this layer changes each feature by some level. Then how model still able to predict good results?
Suppose I have a single word in a sentence with a 6-dimension vector representation.

Embedding layer output:

[0.4045374, 0.0817129, -0.3558856, 0.1497059, -0.6311387, 0.3135473]

After adding positional encoding layer:

[1.2460084, 0.62201524, -0.3094864, 1.148629, -0.62898433, 1.313545]

We can see that some features have deviated highly and some aren’t. For me, this doesn’t seem right!

For e.g. training sentence contains words King and Queen. During training they would have assigned vectors that are close to each other because of their meaning. But positional encoding can deviate these numbers!!!

[Note: I am not asking how positional-encoding works but rather why positional-encoding works?]

alvaroramajo · September 24, 2022, 9:17am

Hi, @Aayush_Jariwala !

You can see the embedding output and the positional encoding combination as a way of “multiplexing” information. It may seem counterintuitive when you look directly at the numeric vectors but the network can interpret that mixed information and adjust its weights consequently.

Anyway, there are still some papers arguing about the good performance of models that don’t perform this type of encoding.

Topic		Replies	Views
Week 4 Positional Encoding Sequence Models week-4 , coursera-platform	5	272	April 18, 2024
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models coursera-platform	4	557	June 4, 2025
Confusing questions about positional encoding in the quiz Sequence Models coursera-platform	1	524	November 10, 2022
Position encoding: Time series Sequence Models coursera-platform	2	566	May 2, 2022
Embedding * W1 + pos_encoding * W2 in positional encoding lab Sequence Models week-4 , coursera-platform	3	206	April 16, 2024

Doesn't positional encoding create noise in embedding(features) of word?

Related topics