Can positional encoding be meaningfully generalized to non-integer position?

One of the new thing I learnt over Transformer is the use of positional encoding. It is a quite fascinating concept and if mentors know if there are any papers or resources that focus on theoretical (and intuitive) properties of good positional encodings in general, please post.

I also wonder out loud if “pos” in the sin/cosine way of doing positional encoding as in transformer, can be meaningfully generalized to non-integer position. E.g. if your sequence is a time series and each position is annotated by the time the event took place (which is a float).

And furthermore, this reminded me (at least superficially) Fourier Transform. Were the researchers motivated by this?

Hey @kechan, I totally agree with you on position encoding :slightly_smiling_face:

You may find this overview interesting. I haven’t read it yet though. It’s worth investigating what type of position encoding are demonstrated a good results on time-series data rather than adapting sin/cos.

F-Nets demonstrated good results on some tasks.

Thanks. I am actually going through the ungraded lab on Transformer Preprocessing, it is actually a good start.