I’ve just completed the course, but I still have some question in my mind. I’m not sure that why LSTM is better than standard RNN for time series prediction?
I guess it depends on the nature of your Time Series data and the stateful properties of it. Does it have cases analogous to language data in which things that occur at an earlier point have an effect on things that come potentially much later in the sequence? If so, then LSTM should be better, since that type of “memory spanning time” is what it is designed to do. I’m not an expert or practitioner of any of this, so this is just my intuition based on what Prof Ng says in the lectures. My guess would be that LSTM can only be better than RNN, but there may be cases in which it is only “as good as” a “plain vanilla” RNN. But the expense of training it is greater since the computations it does and the state it learns are more complex. So maybe the approach would be compare LSTM vs RNN on a relatively small training set of your data early in your design process and see if you notice any performance differential. If not, then it probably makes more sense to stick with RNN.
Of course this is all with the proviso I mentioned above: don’t take my word for it. Now that you’ve been through the full course, it might be worth just going back and watching the lectures where Prof Ng introduces and explains LSTM in Week 1 again. With all that you’ve learned in the interim and with this type question in mind, you might get a more complex perspective on what he says when you listen to it again. I’ll bet he gives you some comments there that would shed more light on your question.
Hi @kittipongko,
in addition to Paul‘s excellent answer:
LSTM can be used as a layer in an RNN architecture. In Keras it’s a built-in layer, see also: Working with RNNs | TensorFlow Core
RNNs w/o LSTM layers struggle in practice with modelling long-term dependencies, see also this explanation:
In theory, classic (or “vanilla”) RNNs can keep track of arbitrary long-term dependencies in the input sequences. The problem with vanilla RNNs is computational (or practical) in nature: when training a vanilla RNN using back-propagation, the long-term gradients which are back-propagated can “vanish”(that is, they can tend to zero) or “explode” (that is, they can tend to infinity),[15] because of the computations involved in the process, which use finite-precision numbers. RNNs using LSTM units partially solve the vanishing gradient problem, because LSTM units allow gradients to also flow unchanged . However, LSTM networks can still suffer from the exploding gradient problem.
(Source)
Also LSTMs have some limitations in reality. If you are interested in how this can be mitigated with advanced transformer architectures, feel free to check out this article.
Hope that helps!
Best regards
Christian