Long Short Term Memory (LSTM) - Coursera

Lakshmi_Narayana · December 22, 2022, 3:17pm

First, I understand that the purpose of LSTM is to remember the information for more timesteps. Is that right?
If that is the case, why is it influencing the activation value with an output gate (a =Γo∗tanh(c)) instead of just the memory cell value?

AbdElRhaman_Fakhry · December 22, 2022, 4:08pm

Hi @Lakshmi_Narayana

I think that The output gate determines the value of the next hidden state. This state contains information on previous inputs. First, the values of the current state and previous hidden state are passed into the third sigmoid function. Then the new cell state generated from the cell state is passed through the tanh function. Both these outputs are multiplied point-by-point. Based upon the final value, the network decides which information the hidden state should carry. This hidden state is used for prediction. also make difference between GRU and a in LSTM we decide to make an additional output gate that add a finalize the next hidden state & to make difference between GRU and LSTM As in GRU a =c and we discover after doing these output gate make a better performance that we removed it in these paper(http://proceedings.mlr.press/v37/jozefowicz15.pdf) said the the output gate is the least important gate but of course it have necessary also

Thanks!
Abdelrahman

Lakshmi_Narayana · December 23, 2022, 12:09am

Yes, I understand that the activation value is sent to the next hidden state and hence have an affect on the next hidden state’s outcome, as that is the whole point behind using RNNs (as per my understanding). What I don’t understand is, why should LSTM influence the activation, when it’s purpose is to select what is retained in the memory cell?

paulinpaloalto · December 23, 2022, 1:06am

But if it only remembers state and never affects the outcome, then what is the point?

In other words, I think your model of the functions of the LSTM is too simplistic. It is not just watching for state that it needs to remember: it is also deciding when state that it remembered from earlier is relevant to modify the prediction of the next output and when to forget previously remembered state that is no longer relevant. And of course the high level point is that it doesn’t know how to do all those things de novo: it learns them through the training process. Prof Ng explained all this in the lectures. It might be a good idea to watch the LSTM lectures again with the things mentioned on this thread in mind. It should make more sense the second time through.

Topic		Replies	Views
GRU relevant word to store in memory Sequence Models coursera-platform	1	306	November 4, 2023
LSTM architecture Sequence Models coursera-platform	1	664	August 8, 2022
Grokking LSTM and GRUs some questions (Week 1 and 2) NLP with Sequence Models week-module-1 , week-module-2	2	63	September 24, 2024
Week 1 - GRU, Why is hidden state and cell memory always same Sequence Models week-module-1 , coursera-platform	7	421	January 20, 2024
How can LSTM or GRU decided what to forget or remember? Sequence Models coursera-platform	3	562	July 25, 2022

Long Short Term Memory (LSTM) - Coursera

Related topics