Why no ShiftRight with LSTM in C3_W4

All the previous times we used an LSTM it was combined with a ShiftRight.

I understood that the ShiftRight is what “Unwraps” the LSTM so that the same LSTMCell is applied to different training input.

But in C3_W4 there is no shift right. Why is this? Does this not mean that the LSTM loses its power to pass state between cells?

Hi @Izak_van_Zyl_Marais

No, definitely not. tl.ShiftRight() is applied to the input, not the LSTM.

We used tl.ShiftRight() when we wanted the model to predict the next token (which is just “the one to the right”). In this week the model predicts question duplicates - there is no such thing as the next to the right. To the same point, in previous Week (C3 W3), the model predicted the tags (NER) and ShiftRight was also rightly not used.

Thank you very much for the answer.

Looking through this other thread with your answers also helped to clarify it. Specifically I was confused about where in Trax it happens that the LSTMCell is applied to the sequence of intput embeddings, one embedding at a time. I had wrongly assumed it is the responsibility of ShiftRight(), but from that linked thread, it happens implicitly and is controlled by LSTMLayer, right?

Happy to help :slight_smile:

I’m not sure I understand what you mean so to be sure that you do not have a common misconception what a LSTMCell and LSTMLayer is I encourage you to gloss over this answer.

My confusion comes form what you mean by “one embedding at a time” and switching between LSTMCell and LSTMLayer - if it’s “one embedding vector at a time” then yes, LSTM (cell or layer) takes this as one of the inputs in each step. But if you mean one embedding feature at a time then no, LSTM does not move over embedding dimension.

Just wanted to make sure you understand this sequence because it’s almost the same for LSTM (except for inner calculations and one additional state).

Thanks for linking to that excellent post. Yes I understood it like this : “one embedding vector” at a time. However, your post really makes it crystal clear what the Layer abstraction is for. They should hire you to write the documentation for tensorflow/Trax!