Hello,
I have a math question. I am trying to figure out how to derive the equations that are provided in the first homework assignment in week 1 for Course 5 on sequence models. I’m referring to part 7 of the homework, which is optional and covers LSTM back propagation. The equations are screenshotted below:
I’m trying to figure out mathematically where these are derived from. I understand the first equation, but the second one down for dpc~ really just stumps me. More specifically, the part that I’m struggling to find out is where the dc_next * gamma_u portion comes from. I figure this involves derivatives of hadamard products and I’m probably just not understanding something since I’m new to this. The forward propagation equations are shown in lecture and can be used to derive the equations above.
Hello @sidewinder4489, it seems the medium post has what you are asking for. I suggest to google for it (or other sources) and check if the steps are correct.
Hello @rmwkwok ,
I looked through the link you provided: LSTM - Derivation of Back propagation through time - GeeksforGeeks
I follow along with that logic and I agree with it. It matches what I am getting. That said, it also appears to be missing the dcnext * gamma_u term in the second equation.
This term is the part that is giving me trouble and I don’t understand where it is coming from. I circled it in red in the second equation (as well as the third and fourth equations).
When I take the equations from the link you provided and rewrite them in terms of the variables in our homework assignment, I get the expression provided, but without the circled term. I’ll show you below:
Could you help me understand what I’m missing here? I don’t know where that circled term in the second equation is coming from. Thank you for your assistance!
Edit: I realize I probably need to update a mistake in the last equation, I’ll do it tomorrow, I need to sleep for now. That said, my question still remains the same.
Hello Iain @sidewinder4489,
It’s certainly a good thing that you read more sources, but when you said “the link I provided”, if you referred to the message I quoted above, I was actually talking about the medium post (by Kartik Shenoy) that shows up in my screenshot. Would you also take a look at that one and check the math, because I see that circled term there? The author said he used the same set of symbols as our course.
Cheers,
Raymond
Hello Raymond,
Thank you for the link. That’s exactly what I needed.
1 Like