Pre and Post-attention LSTM cells in Week 3 Assignment 1

In the first assignment of date translation of Week 3, the function modelf() has four steps in its implementation. The first step makes a bidirectional LSTM cell that accepts the input X. The second step is a for-loop for Ty times.

Why we don’t have a for-loop in the first step for Tx times? The input X has Tx length and so from the diagram of the model, the pre-attention LSTM accepts shall accept one character at a time.

Or is it because the Bidirectional layer makes it possible to accept the whole sentence X and then it will feed one character at a time to the LSTM cells?

In the first week assignment the Jazz Improvisor, the LSTM layer is implemented by a for-loop for Tx times. So I am confused on how we should use LSTM cell in general. Shall we manually create a for-loop or we can just throw the sentence to it?

Can I just run LSTM(return_sequences=True)(X) or shall I run

for character in X:
s,_,c = LSTM(initial_state=[s,c])(character)

Do they give the same output sequence?

Hi @ken2022 ,

In the first step of the function modelf(), a bidirectional LSTM cell is created that accepts the input X as a whole sentence. A bidirectional layer allows the LSTM to process the input in both directions, allowing it to consider context from both the past and future when processing each character.

In the second step, a for-loop is used to iterate over Ty time steps. This is because the goal is to generate a new output sequence of length Ty, one character at a time. The LSTM uses the hidden state and cell state from the previous time step, in addition to the current input, to generate a new output.

In general, whether to use a for-loop or not depends on the specific task and architecture. In this case, using a bidirectional LSTM in the first step and a for-loop in the second step allows the model to efficiently process the input sentence and generate a new output sequence.

If you want to use LSTM for a sequence, you can use LSTM(return_sequences=True) and feed the input sequence X directly to it. It will output the sequence of hidden states for all the time steps.

On the other hand, if you want to use LSTM on a sequence, but you want to feed it one character at a time, you can use a for-loop as you described in your example, but it is not the most efficient way of processing the sequence.

Hope so this answers your question

Best Regards

Muhammad John Abbas

Thanks for your clarification. I got it now.