C1M4 Transformer architecture: Why is the position vector made up of 1 and 0s?

Francesco_Boi · September 8, 2025, 9:15am

In the second video of the module “Transformer architeture”, it described as the input is embedded creating a first guess of the semantic meaning of the prompt and also how LLM is fed with a position vectore of the tokenised prompt. Since a token can either occupy or not occupy a given position in a sentence, I would expect it to be a binary vector. For example: “I ate pasta”. For “I”, I would expect [1, 0,0] or something like that, since “I” is the first token in the sentence, not the 2nd nor the 3rd.
Instead in the video is shown to have decimals and I do not understand the meaning of decimals: either a token occupy a given position or it does not.

TMosh · September 8, 2025, 4:54pm

Perhaps your assumption is incorrect.

Francesco_Boi · September 9, 2025, 7:22am

I can understand that and my question is asking an explanation but yours is completely useless: why even did you bother to write it? It was just to feel better? it does not add any value to this forum

TMosh · September 9, 2025, 7:41am

I was attempting to encourage you to explain why you believe the vector values should be 0’s or 1’s. That seemed to me to be the key to the question.

Francesco_Boi · September 9, 2025, 7:59am

I tried to explain

muzammil539 · September 9, 2025, 8:05am

Positional Encoding in Transformers

The positional encoding is defined as:

For even dimensions:

PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

For odd dimensions:

PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

Explanation of Terms

d_model: Total number of dimensions in the model (e.g., 512).
i: Dimension index (e.g., the 5th dimension of the 512-dimensional vector).
pos: Position of the token in the input sequence

(e.g., in "My name is Muzammil", the word "name" has position 1 if indexing starts from 0).

These encodings are added to the input embeddings to give the model a sense of token order without using recurrence.

this is used for all alternating indices of a certain position word or token per say.

Positional Encoding for Position = 2, Dimension Index = 2

Using the standard Transformer formulas:

For even dimensions:

PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

For odd dimensions:

PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

Computed Values

Given:
- ( pos = 2 )
- ( i = 2 )
- (d_model = 512 )
Angle Rate:

\frac{2}{10000^{\frac{4}{512}}} = 1.9293232398223983
PE(pos=2, 2i=4):

\sin(1.9293) = 0.9364
PE(pos=2, 2i+1=5):

\cos(1.9293) = -0.3509

Topic		Replies	Views
Positional Encoding formula in Transformer Sequence Models coursera-platform	1	609	August 13, 2023
Week 4 Positional Encoding Sequence Models week-module-4 , coursera-platform	5	323	April 18, 2024
Help! I still don't understand how transformer works! Sequence Models coursera-platform	3	629	August 4, 2023
Transformer Pre Processing Lab Question Sequence Models coursera-platform	1	553	June 29, 2022
Week 4: Transformer network Sequence Models coursera-platform	2	549	October 5, 2021

C1M4 Transformer architecture: Why is the position vector made up of 1 and 0s?

Positional Encoding in Transformers

For even dimensions:

For odd dimensions:

Explanation of Terms

Positional Encoding for Position = 2, Dimension Index = 2

For even dimensions:

For odd dimensions:

Computed Values

Related topics