For word embedding, its clear on how the position encoding vector is different. For a sequence of time series data like, using window of [0-40] to predict say [41-60]. The default way of doing the position encoding is still ok?
Any recommendations for using transformers for time series.? The reason i want to use and check this is with lstm (or bi-lstm) the future looking predictions do suffer
Hi Narayana,
Perspectives on this differ. You can start by having a look at this recent paper (in particular section 4.1).
Thanks Reinoudbosch, will take a look.