Why do we need both Sin and Cos in Positional Encoding?

Yasmeen_Asaad_Azazi · September 22, 2025, 9:48am

Hi everyone,

I’ve been going through the positional encoding part of the Transformer lectures. The explanation given was that using only sin can cause ambiguity since, for example, sin(0°) = sin(180°).

For d dimension, sin(pos / 10000^(2i/d)), if I check across positions (e.g., pos=0 ,pos=180), I do see different values because of the scaling factor “pos”

I even tried generating a dataframe of positional encodings using only sin , and all the rows for different positions turned out unique.

So my question is:

If sin-only encodings are already unique across positions (at least within practical sequence lengths), what additional benefit do we get by adding cos?
Is the main reason mathematical stability?

Would love if someone can clarify why both functions are necessary when sin alone seems to work without collisions.

ai_curious · September 22, 2025, 1:04pm

I have been upskilling in positional encoding recently myself and came across this blog post that I found informative. There is a section near the end on the justification/need for including both sine and cosine in the encoding. Does it help?

tl;dr from the linked blog …

First, notice that since all of the elements of PE are sines, the positions x are actually angles. From trigonometry, we know that any operation T that shifts the argument of a trig function must be some kind of rotation. Rotations can famously be applied by applying a linear transformation to a (cosine, sine) pair.

My emphasis added

Topic		Replies	Views
Week 4: Transformer network Sequence Models coursera-platform	2	548	October 5, 2021
Positional encoding intuition NLP with Attention Models week-module-2	1	265	February 8, 2024
Positional_encoding function Sequence Models coursera-platform	1	1198	June 7, 2021
C5W4 Ex2 Positional Encodings Sequence Models coursera-platform	8	486	April 18, 2025
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models coursera-platform	4	619	June 4, 2025

Why do we need both Sin and Cos in Positional Encoding?

Related topics