I am working on a project where I need to handle around 6000 time step sequence. 1 time step is 28 wide vector. I am afraid, I’ll get vanishing/exploding gradient problem if I process 6000 time steps all together in a RNN or LSTM. Is there any practically optimum way of handling such huge sequence(6000) both training and inference?
Can someone suggest me a papers related to this or any idea?
Thanks in advance.