Backpropagation in RNN weight sharing

I have a few doubts which I will put forth below.

1.) What do we mean when we say RNN shares weight? Is it that all 3 input matrix is same across all time steps? However when we back propagate, This gets changed right? All weight matrix at each time step will get updated based on its own gradient? If so, How is it weight sharing?

Well, we use a bunch of RNNs or LSTMs in each layer, like 16,32, or more… So, we do have a bunch of different sets of weigts.
And yes, each RNN, LSTM cell shares the weights across the time steps. The answer to this is detailed in the most upvoted answer here…

I have not read the StackExchange article yet, but I think there’s a fairly clear way to state the answer:

Yes, there is one set of weights and they are used in all time steps. When the gradients are applied during back propagation, there may be different gradient values for each time step, but they are applied to the same shared set of weights, right? So you still end up with a single shared set of weights after applying back propagation.

Of course the exact number and structure of the weights depends on what features you implement in your particular RNN setup (LSTM or not and the various other choices). But with a given architecture, the description above applies.

Hello sir,

I lost you at the point where you said applied to same shared set of weights. My doubt is, Once we update the weights based on their gradients, We end up having different weight matrices for each timestamp. So how is it shared when the weights are different after one step of back propagation?

The point is that the gradients are applied to the same set of weights. Each time step contributes a different gradient, but they are all applied to the same set of weights.