Pre and Post-attention LSTM cells in Week 3 Assignment 1

ken2022 · January 14, 2023, 8:25pm

In the first assignment of date translation of Week 3, the function modelf() has four steps in its implementation. The first step makes a bidirectional LSTM cell that accepts the input X. The second step is a for-loop for Ty times.

Why we don’t have a for-loop in the first step for Tx times? The input X has Tx length and so from the diagram of the model, the pre-attention LSTM accepts shall accept one character at a time.

Or is it because the Bidirectional layer makes it possible to accept the whole sentence X and then it will feed one character at a time to the LSTM cells?

In the first week assignment the Jazz Improvisor, the LSTM layer is implemented by a for-loop for Tx times. So I am confused on how we should use LSTM cell in general. Shall we manually create a for-loop or we can just throw the sentence to it?

Can I just run LSTM(return_sequences=True)(X) or shall I run

for character in X:
s,_,c = LSTM(initial_state=[s,c])(character)
output.append(s)

Do they give the same output sequence?

Muhammad_John_Abbas · January 14, 2023, 8:35pm

Hi @ken2022 ,

In the first step of the function modelf(), a bidirectional LSTM cell is created that accepts the input X as a whole sentence. A bidirectional layer allows the LSTM to process the input in both directions, allowing it to consider context from both the past and future when processing each character.

In the second step, a for-loop is used to iterate over Ty time steps. This is because the goal is to generate a new output sequence of length Ty, one character at a time. The LSTM uses the hidden state and cell state from the previous time step, in addition to the current input, to generate a new output.

In general, whether to use a for-loop or not depends on the specific task and architecture. In this case, using a bidirectional LSTM in the first step and a for-loop in the second step allows the model to efficiently process the input sentence and generate a new output sequence.

If you want to use LSTM for a sequence, you can use LSTM(return_sequences=True) and feed the input sequence X directly to it. It will output the sequence of hidden states for all the time steps.

On the other hand, if you want to use LSTM on a sequence, but you want to feed it one character at a time, you can use a for-loop as you described in your example, but it is not the most efficient way of processing the sequence.

Hope so this answers your question

Best Regards

Muhammad John Abbas

ken2022 · January 14, 2023, 10:32pm

Thanks for your clarification. I got it now.

Topic		Replies	Views
Two tiny questions in W3A1 Sequence Models coursera-platform	4	268	December 27, 2023
LSTM with or without loop? Sequence Models coursera-platform	2	708	September 29, 2022
W3A1 several questions Sequence Models coursera-platform	1	626	December 4, 2022
Why use Bi-directional LSTM in encoder and not within Pre-attention decoder NLP with Sequence Models week-module-1	1	77	November 17, 2024
Week 3 Assignment 1 Neural_machine_translation_with_attention Sequence Models coursera-platform	1	678	April 21, 2022

Pre and Post-attention LSTM cells in Week 3 Assignment 1

Related topics