Context Length and exponential increase in RNNs Memory (Parameters)

In general, RNN is unrolled for infinite context length (in principle) independent of number of parameters. However in the lecture vide titled " Text generation before Transformer", the instructor says otherwise.
How does the number of parameters in RNN increase exponentially with increase in context window size? (or) did I misunderstand the statement?

Right, how many parameters does 1 RNN have? And also remember the RNN has a memory that might also be bidirectional, so if you increase the context the memory needs be longer, more cells of the RNN will provide a contribution to the prediction.

1 Like

Thanks, I got your point. I still feel there is a gray area in my understanding. Let’s suppose we have one layer comprised of two independent RNNs (M Parameters each) to learn the context in both directions. Typically we get the contextualized representation for words from RNN models by training the mode to predict next/previous token given the history, say, P(t_k|t_{1},t_{2},..,t_{k-1}). For example, ELMO.

So I could understand that the STORAGE memory increases to store hidden state vectors of each time step (later used for BPTT or for attention, concatenation…) of RNN. However, the parameters of the network remain same as they are shared across time steps, right?. In the lecture, the term “memory” is loosely used to refer to the “parameters” of a model.

Since the instructor used the phrase “compute and memory” requirements grow exponentially, I presume he is referring to the storage memory.