RNN backpropagation for each time step

I have a basic theoretical question related to the RNN introductory videos. When the backprogragation is explained for a typical many-to-many RNN, the entire sequence is used for a single forward path and backprop update (when the loss is the sum of the individual loss values on the y_hat<i> outputs).

My question is why cannot we do a forward/backward update step-by-step (i.e. doing a forward/backward update for the first timestep, then using the updated weights to train the next timestep). Note, that the history/hidden state (a<i>) is still carried over (I am not talking about a one-to-one trivial MLP). It seems this would “mitigate” the vanishing gradient problem (although, the problem of retaining old information in the hidden state is still valid, but this is somewhat different from the original vanishing gradients problem).

Or, asking the same question from a slightly different angle: if the sequences are extremely long, you definitely want to break-up the forward/backprop updates to batch segments (while carrying over the hidden state across the batches). What is the trade-off of selecting the length of the segment (or even just picking 1 step at a time). Is it just for computational efficiency or the whole learning process is compromised?

I suspect computational efficiency plays a big role.

If the hidden state a^{<t>} is carried over to t+1, then this is sufficient information propagated forward during forward prop to cover (long term) dependencies between t_1 and t_2 > t_1.

But I guess (see also bottom of Backpropagation Through Time and Vanishing Gradient (RNN) - #5 by David_Farago) that for learning these (long term) dependencies between t_1 and t_2, errors need to be propagated back during backprop from t_2 to t_1. So you cannot do backprop for each time step in isolation.