Hi Team,
I realise that as per the lecture provided on GRU, the memory cell is always set equal to the hidden state. The idea that the memory cell “Retains” a characteristic value over the sequence should result in a constant hidden state over the sequence. Then, how would the model predict different output for each element?
Please specify the lecture / timestamp.
Output at each timestep is dependent not just on the cell states across time but also on input at that timestep. With this in mind, please explain why the output should be the same for each timestep.
1 Like
Yes, the whole point of an RNN is that the “cell state” (also called “hidden state” or “memory” state) changes at each timestep. How it changes based on the inputs (which are both the x^{<t>} and the a^{<t-1>} values) is determined by the parameters (weights) that are learned during training. Of course with the GRU and LSTM architectures, that “cell state” gets more complex, but the high level point is the same: it changes with every timestep.
It’s been a couple of years since I listened to the lectures in DLS C5 W1, but I’m sure that Prof Ng discusses this. As in the feed forward nets in C1 and the ConvNets in C4, we can’t say with certainty exactly what the hidden state is doing and how it is doing it, but the idea is that it remembers things like “we’ve seen the subject of the sentence” and “the subject was plural” and uses that to determine later behavior. The behavior is learned based on the training data. There was a lecture in DLS C4 called “What are Deep ConvNets Learning” that describes some really interesting research in which they instrument the neurons in hidden layers of a ConvNet to see how they work and what they “see”, but I don’t know if anyone has done similar research with RNNs.
1 Like
Thanks Ambresh and Paulin, Your response really helps. Correct me if I am wrong; say the memory cell vector stores the information of the ‘cat’ being a singular entity in one of the vector elements(say c[cat]). Now as the model progresses to identifying new words, the vector element c[cat] is doesn’t change but the other values in the memory cell change over time and help in generating new outputs.
Is my inference correct?
I don’t think we can make the kind of literal interpretation that you are proposing, but something like what you suggest may well happen. E.g. the fact that the subject of the sentence has been seen, which word it was in the sequence and whether it was singular or plural would not change in general as you process the later words in the sentence. The point is we don’t really know if there is a single “bit” or element of the hidden state that encodes each of those conditions that I described. You would need to do the kind of instrumentation research that was described in that ConvNet lecture I pointed out.
It might well be the case that there are some elements of the hidden state that change only once through the timesteps and then remain constant. But the overall point is that there are some things that very likely change at every timestep. But even there, you could imagine a case in which the current timestep is a word that is like a “stop word”, but which didn’t get pruned in the pre-processing for some reason. E.g. it normally has semantic effect, but there can be situations in which it is a semantic NOP.
1 Like