Why multiple LTSM layers for encoder and only one LSTM for decoder?

Jose_Leal_Domingues · September 21, 2022, 2:47am

Hi everyone,

Adding to the very relevant questions from https://community.deeplearning.ai/t/questions-regarding-course-4-week-1/175440, I wanted to ask the following: Why several serial layers of LSTMs are used for the encoder part and only one LSTM layer is user for the decoder?

If I may one for question: Regarding trax’s LSTM implementation, the layer actually has several inputs/outputs (c, a, y hat, t), right? Does it simply “know” how to connect and forward stacked LSTMs? What if we wanted to inspect or use somehow these various inputs/outputs from the middle or final layers?

Thanks in advance,
Jose

arvyzukai · September 21, 2022, 6:30am

Hi @Jose_Leal_Domingues

Actually decoder uses n_decoder_layers in NMTAttn() function in UNQ_C4 step 7. pre_attention_decoder_fn() on the other hand uses a single LSTM layer. It’s a design choice but you “might see” why it makes sense (to not use multiple layers on pre-attention and then again use multiple layers in NMTAttn).

Short answer - yes. Each LSTM cell get the same number (shape) of inputs and outputs and that is why you need to initialize hidden states - for the very first step (trax also helps you with initialization too by creating shapes and values of initial hidden states).

There are number of ways you could do that and they vary in simplicity. If you would want to just “check” them you could simply “debug” the code and follow trace.
You could also inherit the class and write your own forward function, maybe dumping values to some file.
Also there are more sophisticated ways of doing it which would probably require a whole course on it

Topic		Replies	Views
Number of LSTM layers in the decoder? NLP with Attention Models week-1	1	580	May 21, 2022
Machine Translation adding more LSTM layers Sequence Models	1	508	June 15, 2022
LSTM and encoder_layers NLP with Attention Models week-1	6	679	August 14, 2023
Week2 Emojifier-V2 Model Architecture Sequence Models	1	498	May 4, 2023
How Transformer model works? Unclear after the lab Sequence Models	11	614	August 20, 2021

Why multiple LTSM layers for encoder and only one LSTM for decoder?

Related topics