I am following course 5 and re-implemented both RNN and LSTM in Julia.
I run LSTM on dinos.txt dataset (not currently in git repo code).
However, it gives me very low accuracy hardly exceeding ~84%, I believe it has to do with the padding that network is trying to learn because I am processing my sequences in mini-batches.
I am a bit confused how this is handled in the course, is the model trained by single sequence? It does not look like. So how are the different lengths handled in a batch?
I tried using loss that ignores the padding, but my results got even worse.
Thanks for any help.