For the backpropagation through time algorithm, do you need to compute da wrt all the inputs? Each activation “a” has a few inputs: the weights Wa, ba, the previous time frame a, and input frame X. Could you post the full backpropagation algorithm through time here using this course’s notation? Is it posted elsewhere in future lectures?
This is a good general resource on how backpropagation works.
Note that this page uses a different cost function, so it isn’t exactly applicable. But you can see how the math works.
In addition to the resources that Tom has given us, note that the first programming assignment in C5 W1 (Building Your Recurrent Neural Network - Step by Step) will include an optional (ungraded) section that leads you through the implementation of the backpropagation for the RNN with LSTM.