Why do we use the same parameters for different timestamps in RNN?

In the RNN architecture we use the same parameter for different timestamps in the model. Why do we do this, wouldn’t it be better if we have unique weights and biases for each timestamp. Is the problem is due to the fact of the large number of parameters or the notation would be cumbersome or is there any other reason?

Such a system (with a full set of parameters for each timestamp) would be very computationally expensive to train, and would likely be badly overfit to the training set.

Right! The network as defined already takes into account what came before in the sequence. To use the language parsing example, the meaning is determined by the local context, not by the exact timestep at which a given phrase occurs.