Long Short Term Memory (LSTM) - Coursera

First, I understand that the purpose of LSTM is to remember the information for more timesteps. Is that right?
If that is the case, why is it influencing the activation value with an output gate (a =Γo∗tanh(c)) instead of just the memory cell value?

Hi @Lakshmi_Narayana

I think that The output gate determines the value of the next hidden state. This state contains information on previous inputs. First, the values of the current state and previous hidden state are passed into the third sigmoid function. Then the new cell state generated from the cell state is passed through the tanh function. Both these outputs are multiplied point-by-point. Based upon the final value, the network decides which information the hidden state should carry. This hidden state is used for prediction. also make difference between GRU and a in LSTM we decide to make an additional output gate that add a finalize the next hidden state & to make difference between GRU and LSTM As in GRU a =c and we discover after doing these output gate make a better performance that we removed it in these paper(http://proceedings.mlr.press/v37/jozefowicz15.pdf) said the the output gate is the least important gate but of course it have necessary also

Thanks!
Abdelrahman

Yes, I understand that the activation value is sent to the next hidden state and hence have an affect on the next hidden state’s outcome, as that is the whole point behind using RNNs (as per my understanding). What I don’t understand is, why should LSTM influence the activation, when it’s purpose is to select what is retained in the memory cell?

But if it only remembers state and never affects the outcome, then what is the point?

In other words, I think your model of the functions of the LSTM is too simplistic. It is not just watching for state that it needs to remember: it is also deciding when state that it remembered from earlier is relevant to modify the prediction of the next output and when to forget previously remembered state that is no longer relevant. And of course the high level point is that it doesn’t know how to do all those things de novo: it learns them through the training process. Prof Ng explained all this in the lectures. It might be a good idea to watch the LSTM lectures again with the things mentioned on this thread in mind. It should make more sense the second time through.