I understand how Back Propagation Through Time (BPTT) works in Many-to-One RNN architecture.

For example, dW_aa or partial derivative of loss function with respect to W_aa will equivalent to the following equation (Using BPTT)

(Correct me if this is still wrong)

But when it comes to Many-to-Many RNN

I’m not confident enough to state that my understanding is correct, please check my correctness. Is this true ? All I add from the previous equation is the loss associate to each output unit

So, the number of term in the summation comprises of (t_x) + (t_x - 1) + (t_x - 2) + … + (1) = (t_x)!