Can anyone clarify LSTMs to me? Better with an example

Provide as much detail you’ve acquired on it.

Great question!

If you’re starting out, I’d highly recommend Andrew Ng’s videos in Course 5 of the Deep Learning Specialization (Sequence Models) — he gives one of the most beginner-friendly walkthroughs of LSTMs. Also, Course 3 of the NLP Specialization (on Sequence Models for NLP) builds on that and applies LSTMs to real tasks like text generation and machine translation.

But what is an LSTM?

An LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) designed to better capture patterns over longer sequences. Standard RNNs tend to “forget” earlier parts of the input if the sequence is long — LSTMs solve this by adding gates that control what to keep, update, or forget.

Each LSTM unit has:

  • A forget gate (what to discard from memory),
  • An input gate (what new information to store),
  • And an output gate (what to pass to the next layer/time step).

This gating mechanism helps it remember important context — like grammar or meaning — over long sentences.

Simple example

Let’s say you’re feeding an LSTM the sentence:

“The clouds are dark and it looks like it’s going to…”

You want it to predict the next word. The correct prediction might be “rain”.

An LSTM is able to use context from earlier in the sentence (“clouds”, “dark”) to realize that “rain” is a more likely continuation than, say, “snow” or “shine”.

Where a basic RNN might forget the early part (“The clouds”), an LSTM can retain that memory, allowing it to make better predictions.

1 Like