Week 4: Transformer network

Can you please let me know why are we using both sine and cosine functions for position encoding? Why can’t we use any one of them? Also, I could see both are being used alternatively for each dimension. Please explain why this approach?

Hi clonesangram,

You can have a look at this explanation using this blog post. Here is an answer to a question about the importance of linear relations between positional encodings.

Thanks, I will look into it