Week 4: Transformer network

clonesangram · October 5, 2021, 2:27pm

Can you please let me know why are we using both sine and cosine functions for position encoding? Why can’t we use any one of them? Also, I could see both are being used alternatively for each dimension. Please explain why this approach?

reinoudbosch · October 5, 2021, 4:10pm

Hi clonesangram,

You can have a look at this explanation using this blog post. Here is an answer to a question about the importance of linear relations between positional encodings.

clonesangram · October 5, 2021, 4:30pm

Thanks, I will look into it

Topic		Replies	Views
Can positional encoding be meaningfully generalized to non-integer position? Sequence Models	2	772	July 18, 2021
C5W4 A1 Positional encoding divide by zero Sequence Models	3	526	November 27, 2023
Positional encoding intuition NLP with Attention Models week-2	1	227	February 8, 2024
C5_W4_A1_Transformer_Subclass_v1 - 1.2 - Sine and Cosine Positional Encodings Sequence Models	2	830	October 16, 2021
Positional encoding in transformer networks (W4) - why adding as opposed to concatenating? Sequence Models	3	547	November 16, 2022

Week 4: Transformer network

Related topics