RNN backpropagation for each time step

volgy · June 19, 2021, 5:19pm

I have a basic theoretical question related to the RNN introductory videos. When the backprogragation is explained for a typical many-to-many RNN, the entire sequence is used for a single forward path and backprop update (when the loss is the sum of the individual loss values on the y_hat<i> outputs).

My question is why cannot we do a forward/backward update step-by-step (i.e. doing a forward/backward update for the first timestep, then using the updated weights to train the next timestep). Note, that the history/hidden state (a<i>) is still carried over (I am not talking about a one-to-one trivial MLP). It seems this would “mitigate” the vanishing gradient problem (although, the problem of retaining old information in the hidden state is still valid, but this is somewhat different from the original vanishing gradients problem).

Or, asking the same question from a slightly different angle: if the sequences are extremely long, you definitely want to break-up the forward/backprop updates to batch segments (while carrying over the hidden state across the batches). What is the trade-off of selecting the length of the segment (or even just picking 1 step at a time). Is it just for computational efficiency or the whole learning process is compromised?

TMosh · June 28, 2021, 4:37am

I suspect computational efficiency plays a big role.

David_Farago · September 3, 2022, 1:40pm

If the hidden state a^{<t>} is carried over to t+1, then this is sufficient information propagated forward during forward prop to cover (long term) dependencies between t_1 and t_2 > t_1.

But I guess (see also bottom of Backpropagation Through Time and Vanishing Gradient (RNN) - #5 by David_Farago) that for learning these (long term) dependencies between t_1 and t_2, errors need to be propagated back during backprop from t_2 to t_1. So you cannot do backprop for each time step in isolation.

Topic		Replies	Views
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models coursera-platform	12	912	July 15, 2023
Vanishing Gradient RNN Sequence Models coursera-platform	7	536	April 6, 2022
Week 1 assignment 1 backprop; calculating dx Sequence Models coursera-platform	4	561	December 10, 2023
LSTM backpropagation confusion Sequence Models week-module-1 , coursera-platform	2	57	November 12, 2024
Bi-directional RNNs NLP with Sequence Models week-module-2	3	374	August 22, 2023

RNN backpropagation for each time step

Related topics