Why no ShiftRight with LSTM in C3_W4

Izak_van_Zyl_Marais · April 26, 2023, 5:43am

All the previous times we used an LSTM it was combined with a ShiftRight.

I understood that the ShiftRight is what “Unwraps” the LSTM so that the same LSTMCell is applied to different training input.

But in C3_W4 there is no shift right. Why is this? Does this not mean that the LSTM loses its power to pass state between cells?

arvyzukai · April 26, 2023, 6:35am

Hi @Izak_van_Zyl_Marais

No, definitely not. tl.ShiftRight() is applied to the input, not the LSTM.

We used tl.ShiftRight() when we wanted the model to predict the next token (which is just “the one to the right”). In this week the model predicts question duplicates - there is no such thing as the next to the right. To the same point, in previous Week (C3 W3), the model predicted the tags (NER) and ShiftRight was also rightly not used.

Izak_van_Zyl_Marais · April 26, 2023, 3:19pm

Thank you very much for the answer.

Looking through this other thread with your answers also helped to clarify it. Specifically I was confused about where in Trax it happens that the LSTMCell is applied to the sequence of intput embeddings, one embedding at a time. I had wrongly assumed it is the responsibility of ShiftRight(), but from that linked thread, it happens implicitly and is controlled by LSTMLayer, right?

arvyzukai · April 26, 2023, 5:15pm

Happy to help

I’m not sure I understand what you mean so to be sure that you do not have a common misconception what a LSTMCell and LSTMLayer is I encourage you to gloss over this answer.

My confusion comes form what you mean by “one embedding at a time” and switching between LSTMCell and LSTMLayer - if it’s “one embedding vector at a time” then yes, LSTM (cell or layer) takes this as one of the inputs in each step. But if you mean one embedding feature at a time then no, LSTM does not move over embedding dimension.

Just wanted to make sure you understand this sequence because it’s almost the same for LSTM (except for inner calculations and one additional state).

Izak_van_Zyl_Marais · May 2, 2023, 9:47am

Thanks for linking to that excellent post. Yes I understood it like this : “one embedding vector” at a time. However, your post really makes it crystal clear what the Layer abstraction is for. They should hire you to write the documentation for tensorflow/Trax!

Topic		Replies	Views
May I know what exactly does Tl.shiftright do? NLP with Sequence Models	3	287	November 25, 2021
Question: tl.ShiftRight() NLP with Sequence Models week-2	2	534	December 28, 2022
Creating a GRU model using Trax NLP with Sequence Models week-2	3	732	July 26, 2022
How do the GRULM really work? NLP with Sequence Models week-2	1	540	July 10, 2022
tl.ShiftRight layer is by default nested inside the Serial Combinator NLP with Attention Models week-1	3	512	December 24, 2022

Why no ShiftRight with LSTM in C3_W4

Related topics