Backpropagation through time derivation question

Eugene_Ku · July 30, 2023, 11:01pm

Hi, I have a question about deriving backpropagation through time, I do not get how get the order of the matrix after the derivation. So for example, what rule makes dWax have dtanh in the front while dxt has dtanh in the back? To be honest, I can know the order for most of them if I write out the dimensions since I know what dWax is going to look like for example. However, I cannot do that for dWaa because it is a square matrix. Thank you so much for whoever answers, it’s been a long 3 days reading about matrix calculus haha.

Speaking of which, when there is like a formula or something that you guys don’t understand how it’s derived and it bothers you, would you guys move on after a while?

rmwkwok · July 31, 2023, 12:05am

Hello @Eugene_Ku

Matrix differentiation is different from scalar, and the chain rule that we know works perfectly only with scalar differentiation. This post gave an example of when it breaks.

The most brute-force way is to break down a matrix equation into a list of scalar equations, do the differentiation, and form them back into a matrix equation. This can easily be done with some simple low-rank matrices (like the linked post).

Does it have to be? Is there anyway we can make it not be a square matrix? If not, would you mind to show us how you used the dimensions to verify them?

I vote Yes, and come back later on if it is/become very important.

Cheers,
Raymond

Eugene_Ku · July 31, 2023, 3:03am

Hi Raymond, good to see you again. On your comment,

Blockquote
Does it have to be? Is there anyway we can make it not be a square matrix? If not, would you mind to show us how you used the dimensions to verify them?

edit: oops, I thought I got it but I don’t. So back to the dWaa, dimension checking method would not work because either order results in the same dimension right?
This is how I dimension check btw (used z instead of tanh):

This method would work for all except dWaa

Thank you so much always Raymond! You are like the savior when I thought I was all alone in a sandstorm hah.

rmwkwok · July 31, 2023, 3:48am

Hello @Eugene_Ku,

Yes. I just checked too. Then I think the luck is that we have dWax as our reference - dWaa should look like dWax . I know this is a dirty way but … it is a way. As for why dWax should look like that, my reason is as I said - matrix differentiation isn’t like scalar so the ordering is not arbitary.

But yes, you are right, dimension checking won’t be helpful for dWaa.

Not everyone is willing to be in a sandstorm, but if the ticket to dream is there, what choice do you and I have? Let’s go!

Cheers,
Raymond

Eugene_Ku · July 31, 2023, 4:36am

ahaha thats totally right!

Eugene_Ku · August 4, 2023, 3:55pm

@rmwkwok Hi Raymond, I decided to make a post about this topic! Matrix Calculus For Deep Learning: Taking Derivatives of Matrices Through Time | by Eugene Ku | Aug, 2023 | Medium
Let me know what you think!

rmwkwok · August 4, 2023, 7:18pm

Hey @Eugene_Ku,

Wonderful! That is in my to-do for today.

Raymond

rmwkwok · August 5, 2023, 3:44pm

Hello @Eugene_Ku,

Interesting notes!

“0 if i != a” is very intuitive, because it is just saying that the a-th sample in X doesn’t have impact on the i-th sample in Y, which is, of course, true.

F_{abcd} is a very useful notation! It tells you immediately it is 4 dimensional, and we can play with the subscripts by arguments.

Cheers,
Raymond

PS: The notes can be improved if it is for audience in beginner level, otherwise, it is an interesting notes with a lot of helpful discussions. Thanks for sharing, @Eugene_Ku!

Eugene_Ku · August 5, 2023, 3:46pm

Thank you! I’ll keep that in mind. Might make one for multivariate chain rules as well but thanks for reading and the feedback

Topic		Replies	Views
Course 1 Week 3 Backpropagation Intuition (Optional) Neural Networks and Deep Learning	5	810	December 18, 2021
Course 1: Week 3 (Backpropagation derivative equations clarification) Neural Networks and Deep Learning	1	595	June 17, 2021
Derivation of Backpropagation in RNNs Sequence Models week-1	4	111	May 26, 2024
Explanation for derived gradients for LSTM back-prop? Sequence Models	3	678	September 6, 2021
C5W1A1 LSTM gates gradients WRONG? Sequence Models	3	600	November 3, 2022

Backpropagation through time derivation question

Related topics