Course 5: Sequence models - Handling the padding?

Hi,

I am following course 5 and re-implemented both RNN and LSTM in Julia.

I run LSTM on dinos.txt dataset (not currently in git repo code).
However, it gives me very low accuracy hardly exceeding ~84%, I believe it has to do with the padding that network is trying to learn because I am processing my sequences in mini-batches.

I am a bit confused how this is handled in the course, is the model trained by single sequence? It does not look like. So how are the different lengths handled in a batch?

I tried using loss that ignores the padding, but my results got even worse.
Thanks for any help.

Sorry, I don’t have any experience with Julia.