Yes, the architecture of an RNN is that there is only one cell. It just gets used repeatedly for the individual “timesteps”. Note that when we train the network, the gradients may well be different at each timestep, but they are all applied to the same weights. Also note that the architecture of the cell can get pretty complicated by the time you add LSTM, so there are quite a few separate weights (forget gate, update gate, output gate).
1 Like