Why is Positional Encoding added to the input, instead of being concatenated to it?

Amir_Fekrazad · October 3, 2022, 10:38pm

I just finished Course 5 of the specialization (Sequence Models). This course was not as well-structured as the previous ones and felt kinda hasty. Some details (especially about transformer models) were handwaved away.

I have a question about positional encoding. I found it weird and unintuitive that positional encoding vectors are added (sum) to the input vector (X), because summation always causes a loss of information. For example, if you have a + b = 5, you can’t later infer if a and b were 2 and 3, or 2.5 and 2.5, etc.

Why isn’t positional encoding passed as a separate vector (concatenated to the input, as opposed to being added to it)?

TMosh · October 3, 2022, 10:54pm

There is no loss of information in your example, because we do not discard the original features. We are adding a new feature that provides a constraint. It adds information.

Topic		Replies	Views
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models coursera-platform	4	610	June 4, 2025
Transformer: Why Add, Not Concat? Sequence Models coursera-platform	7	1993	April 26, 2023
Doesn't positional encoding create noise in embedding(features) of word? NLP with Attention Models week-module-2	1	586	September 24, 2022
Week 4 Positional Encoding Sequence Models week-module-4 , coursera-platform	5	317	April 18, 2024
[C5W4] Please help! Stuck at assignment Ex.5 Encoder Sequence Models coursera-platform	4	698	July 29, 2022

Why is Positional Encoding added to the input, instead of being concatenated to it?

Related topics