Can you please let me know why are we using both sine and cosine functions for position encoding? Why can’t we use any one of them? Also, I could see both are being used alternatively for each dimension. Please explain why this approach?
Can you please let me know why are we using both sine and cosine functions for position encoding? Why can’t we use any one of them? Also, I could see both are being used alternatively for each dimension. Please explain why this approach?
Hi clonesangram,
You can have a look at this explanation using this blog post. Here is an answer to a question about the importance of linear relations between positional encodings.
Thanks, I will look into it