A RNN has short term memory. But what is the duration of the memory in terms of the number of time steps?

Thanks in advance.

A RNN has short term memory. But what is the duration of the memory in terms of the number of time steps?

Thanks in advance.

It’s the size of the hidden states.

@TMosh can you clarify a bit more ?

I presume you don’t mean the number of units/layers-- So then the size/dimension of the activations ?

(I ask because I had this question on the back of my mind too)

It means the number of units (hidden state units). By increasing this number, the RNN captures more complex patterns and remembers more context about the data (more memory). But it has limitations: vanishing gradient.

The answer to this question is totally depend on the data. For some data, the answer might be 3 time steps and for some, it might be 5. No universal answer…