Hey @LE_Quang_Linh,
When we have the entire input sequence known to us in advance, then if available, we can use the Tensorflow built-in functions. On the other hand, if it is not known in advance, say we need the prediction from the previous cell to be fed as the input to the next cell, then in that case, we use the for-loop. I am assuming you are clear on this much.
And this should clear the Emojify assignment (in which we know the sequences in advance, hence no for-loop in the LSTM-based model). As for the Jazz Improvisation assignment, in the djmodel function, you could avoid the for-loop, since during training, we know the input sequence in advance, however, in the music_inference model, we have to stick with the for-loop, since we need the prediction from the previous cell to be fed into the next cell as the input.
Now, let’s come to the Neural Machine Translation assignment. In this assignment, during both training and inference, we know the entire sequence. However, for each of the time-steps, we are using the attention mechanism, i.e., the one-step-attention. For starters, this is a key part of this assignment, and hence, it makes sense for learners to implement this themselves. And since this is implemented as a manual function which needs to be called on each time-step individually, hence, we must use the for-loop. Not only this, the attention mechanism used in this assignment is based on Bahdanau et. al. whereas, the attention mechanism that Tensorflow offers is the Dot-product attention layer, a.k.a. Luong-style attention, which you can find here. You can read more about the difference between the 2 here. Since there is no Tensorflow built-in function to implement what we need, hence, once again we need to write a function ourself, thereby necessitating the need for a for-loop.
Now, you might be thinking, can we implement the one_step_attention function in such a way that it computes the attention for all the time-steps at once, so that we can avoid the for-loop? I don’t think so that it would be possible, since for computing the attention for any one time-step, we need s_prev (previous hidden state of the (post-attention) LSTM), which we can only obtain if we iterative over the time-steps one-by-one.
P.S. - I have my classes as of now, I will answer the rest of your query once my classes are over.
Cheers,
Elemento