Backpropagation through time derivation question

Hello @Eugene_Ku

Matrix differentiation is different from scalar, and the chain rule that we know works perfectly only with scalar differentiation. This post gave an example of when it breaks.

The most brute-force way is to break down a matrix equation into a list of scalar equations, do the differentiation, and form them back into a matrix equation. This can easily be done with some simple low-rank matrices (like the linked post).

Does it have to be? Is there anyway we can make it not be a square matrix? If not, would you mind to show us how you used the dimensions to verify them?

I vote Yes, and come back later on if it is/become very important.


1 Like