Understanding of LSTM

someone555777 · June 20, 2023, 4:52pm

Hi! I watched DL specialisation and NLP courses, but still can’t understand how LSTM works under a hood. So, we have gates from words to pipe, that can contain some additional info on base of which any of future computations can appear. Also we prevent of gradient problems. Ok, but how is my algorythm will understand what principles are of selection words and they using in the future? Is it set by gradient to when finding the best W of gates? I can’t imagine how this process is looks like too.

paulinpaloalto · June 20, 2023, 7:00pm

What both GRU and LSTM do over and above a basic RNN is they add additional “gates” that effectively make the “hidden state” learned by the RNN more powerful and expressive. In principle, I suppose a “plain vanilla” RNN without the addition of explicit “update” and “forget” gates could use its hidden state to implement similar behavior. But LTSM and GRU make the function of those subsets of the hidden state more explicit and (apparently) that makes it easier for the usual back propagation based training to learn those complex behaviors. As with everything in any kind of supervised Neural Network, the learning happens by training the network on the training data using back propagation driven by an appropriate cost function so that the learning is meaningful. As with everything, it either works or it doesn’t, right? Meaning that people must have tried using plain vanilla RNNs and LSTM on the same task and LSTM must perform better in some significant types of problems otherwise it wouldn’t be a big deal.

someone555777 · June 21, 2023, 5:55pm

so, how model can understand, should LSTM work with problem of pluralization of verb or adding 's to the end, etc? And shold we manulally programming this gates? If not, how does model understand how do this gates should work?

paulinpaloalto · June 21, 2023, 5:57pm

You should not manually do anything with the gates other than selecting whether you want to use a GRU or an LSTM model. The point is that it learns what works through training. Training a language model requires a lot of labelled data, of course.

someone555777 · June 21, 2023, 6:08pm

so, do you mean, that when we train on huge sentences, model will understand that any of words in the past has plural form by it’s embedding? And after this hypothesis it should understand by itself, that if will be another none-subject in sentense, it should nivelate influence of first none-subject?

How many of training sentences should it be to make model so smart?

paulinpaloalto · June 21, 2023, 6:09pm

There is no magic number, but it’s a lot.

Or to put it in more practical terms: enough that it works. You have to experimentally determine how much that is.

someone555777 · June 21, 2023, 6:21pm

So, do all LSTMs work like this? Do they try to find any correlations with any characteristics of word in the past or maybe combinations of characteristics that should be passed to the future? And all of this for getting desired results, if detected patterns of characteristics of words will be repeated in another sentenses? Can you maybe append my thoughts, because it really looks like magic in first view. It should be really hard work to a model without any expanations… I compassionate

paulinpaloalto · June 21, 2023, 6:55pm

Yes, that is the “magic” of LSTMs. There are lots of other good sources of information and intuition about LSTMs and RNNs in general. I found this article by Andrei Karpathy very helpful. And he also has a number of relevant lectures out on YouTube, e.g. this one.

Topic		Replies	Views
Grokking LSTM and GRUs some questions (Week 1 and 2) NLP with Sequence Models week-1 , week-2	2	49	September 24, 2024
Can anyone clarify LSTMs to me? Better with an example AI Discussions ai-discussions	1	44	April 23, 2025
Understanding the Mechanisms of Sequence Prediction Sequence Models coursera-platform	1	510	June 17, 2023
GRU relevant word to store in memory Sequence Models coursera-platform	1	305	November 4, 2023
Week 1 - Quiz Problem Sequence Models week-1 , coursera-platform	1	297	January 20, 2024

Understanding of LSTM

Related topics