BPTT for many-to-many RNN architecture

wallik2 · February 10, 2023, 7:48am

I understand how Back Propagation Through Time (BPTT) works in Many-to-One RNN architecture.

For example, dW_aa or partial derivative of loss function with respect to W_aa will equivalent to the following equation (Using BPTT)

(Correct me if this is still wrong)

But when it comes to Many-to-Many RNN

I’m not confident enough to state that my understanding is correct, please check my correctness. Is this true ? All I add from the previous equation is the loss associate to each output unit

So, the number of term in the summation comprises of (t_x) + (t_x - 1) + (t_x - 2) + … + (1) = (t_x)!

reinoudbosch · February 15, 2023, 10:36pm

Hi wallik2,

This blogpost may clarify (scroll down to BPTT).

Topic		Replies	Views
Simple BPTT for RNN implementation equation question AI Discussions ai-discussions	6	161	May 24, 2024
Week 1 Assignment 1 Backpropagation Sequence Models coursera-platform	19	2760	July 20, 2024
RNN basic question Sequence Models coursera-platform	1	516	November 1, 2021
Backpropagation Through Time and Vanishing Gradient (RNN) Sequence Models coursera-platform	10	689	September 21, 2022
[Week 1] How are the weights updated in backpropagation thorough time? Sequence Models coursera-platform	12	917	July 15, 2023

BPTT for many-to-many RNN architecture

Related topics