How to handle very long sequence ?(6000 time steps)

I am working on a project where I need to handle around 6000 time step sequence. 1 time step is 28 wide vector. I am afraid, I’ll get vanishing/exploding gradient problem if I process 6000 time steps all together in a RNN or LSTM. Is there any practically optimum way of handling such huge sequence(6000) both training and inference?

Can someone suggest me a papers related to this or any idea?
Thanks in advance.

A 1-minute internet search turned up this tutorial: