LSTM with or without loop?


I have almost finished the course but I realize that I don’t understand many points.

First of all, when we have to use a loop through the time steps as in the exercise “Neural Machine Translation” and the exercise “Jazz Improvisation” and when we should not use a loop as in the exercise “Emojify”. I have seen the corresponding answer [here] (No loop over Tx in the Emoji model) but I am still confused because in the “Neural Machine Translation” problem we know the whole sequence in advance but we still use the loop.

Second, why does the LSTM layer inside the loop behave differently without the loop. I mean without loop, the LSTM layer will automatically process all time steps (until the end of the sequence). But if we use the LSTM layer inside a loop, it will only process one step to the right.

Finally, can we use TimeDistributed Layer and “return_sequences=True” in LSTM layer to avoid using a loop?

I really don’t understand what’s happen so please help me !

Thank you very much for the help !

Hey @LE_Quang_Linh,
When we have the entire input sequence known to us in advance, then if available, we can use the Tensorflow built-in functions. On the other hand, if it is not known in advance, say we need the prediction from the previous cell to be fed as the input to the next cell, then in that case, we use the for-loop. I am assuming you are clear on this much.

And this should clear the Emojify assignment (in which we know the sequences in advance, hence no for-loop in the LSTM-based model). As for the Jazz Improvisation assignment, in the djmodel function, you could avoid the for-loop, since during training, we know the input sequence in advance, however, in the music_inference model, we have to stick with the for-loop, since we need the prediction from the previous cell to be fed into the next cell as the input.

Now, let’s come to the Neural Machine Translation assignment. In this assignment, during both training and inference, we know the entire sequence. However, for each of the time-steps, we are using the attention mechanism, i.e., the one-step-attention. For starters, this is a key part of this assignment, and hence, it makes sense for learners to implement this themselves. And since this is implemented as a manual function which needs to be called on each time-step individually, hence, we must use the for-loop. Not only this, the attention mechanism used in this assignment is based on Bahdanau et. al. whereas, the attention mechanism that Tensorflow offers is the Dot-product attention layer, a.k.a. Luong-style attention, which you can find here. You can read more about the difference between the 2 here. Since there is no Tensorflow built-in function to implement what we need, hence, once again we need to write a function ourself, thereby necessitating the need for a for-loop.

Now, you might be thinking, can we implement the one_step_attention function in such a way that it computes the attention for all the time-steps at once, so that we can avoid the for-loop? I don’t think so that it would be possible, since for computing the attention for any one time-step, we need s_prev (previous hidden state of the (post-attention) LSTM), which we can only obtain if we iterative over the time-steps one-by-one.

P.S. - I have my classes as of now, I will answer the rest of your query once my classes are over.


Hey @LE_Quang_Linh,

LSTM layer behaves exactly the same, whether we use it inside a for-loop or outside a for-loop. You see the difference in the output because of the difference in the input. Without the for-loop, we feed the data for all the time-steps, whereas with a for-loop, we feed the data for only a single time-step, and hence, the difference in the outputs. You can see that LSTM layer behaves exactly the same by running a small piece of code. Check out Version 13 of this kernel.

I guess this kernel also tells you how you can avoid the for-loop if you want. You can read a little more about the Time-Distributed layer here. However, as I mentioned in the above post, due to the way we compute the attention in the Neural Machine Translation assignment, you won’t be able to avoid the for-loop, to the best of my knowledge. I hope this helps.