C1M4 Transformer architecture: Why is the position vector made up of 1 and 0s?

In the second video of the module “Transformer architeture”, it described as the input is embedded creating a first guess of the semantic meaning of the prompt and also how LLM is fed with a position vectore of the tokenised prompt. Since a token can either occupy or not occupy a given position in a sentence, I would expect it to be a binary vector. For example: “I ate pasta”. For “I”, I would expect [1, 0,0] or something like that, since “I” is the first token in the sentence, not the 2nd nor the 3rd.
Instead in the video is shown to have decimals and I do not understand the meaning of decimals: either a token occupy a given position or it does not.

Perhaps your assumption is incorrect.

I can understand that and my question is asking an explanation but yours is completely useless: why even did you bother to write it? It was just to feel better? it does not add any value to this forum

I was attempting to encourage you to explain why you believe the vector values should be 0’s or 1’s. That seemed to me to be the key to the question.

I tried to explain

:blue_book: Positional Encoding in Transformers

The positional encoding is defined as:

For even dimensions:

PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

For odd dimensions:

PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

:magnifying_glass_tilted_left: Explanation of Terms

  • d_model: Total number of dimensions in the model (e.g., 512).

  • i: Dimension index (e.g., the 5th dimension of the 512-dimensional vector).

  • pos: Position of the token in the input sequence

(e.g., in "My name is Muzammil", the word "name" has position 1 if indexing starts from 0).

These encodings are added to the input embeddings to give the model a sense of token order without using recurrence.

this is used for all alternating indices of a certain position word or token per say.

:blue_book: Positional Encoding for Position = 2, Dimension Index = 2

Using the standard Transformer formulas:

For even dimensions:

PE_{(pos, 2i)} = \sin\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

For odd dimensions:

PE_{(pos, 2i+1)} = \cos\left(\frac{pos}{10000^{\frac{2i}{d_{\text{model}}}}} \right)

:white_check_mark: Computed Values

  • Given:

    • ( pos = 2 )
    • ( i = 2 )
    • (d_model = 512 )
  • Angle Rate:

    \frac{2}{10000^{\frac{4}{512}}} = 1.9293232398223983
  • PE(pos=2, 2i=4):

    \sin(1.9293) = 0.9364
  • PE(pos=2, 2i+1=5):

    \cos(1.9293) = -0.3509