Explanation for derived gradients for LSTM back-prop?

dwyerfire · August 30, 2021, 4:18pm

Hello here is my question from the original discussion board:

paulinpaloalto · August 30, 2021, 7:20pm

This looks like just a notation issue. That is to say: what do you think Prof Ng means by dtanh? And what does he mean by db_a? Also note that we’re doing partial derivatives here, not univariate ones.

dwyerfire · September 6, 2021, 4:18pm

Thanks for the reply!
So are you saying that what he means by what he wrote is what I wrote more verbosely?

I’m a bit confused because I thought he said that notation is shorthand for derivative w.r.t. the loss not the output (i.e. dtanh := dtanh/dL). Am I mistaken?

paulinpaloalto · September 6, 2021, 4:27pm

Yes, you have it “upside down”. When he says dfoo, what he means is \displaystyle \frac {\partial L}{\partial foo} or perhaps \displaystyle \frac {\partial J}{\partial foo} depending on the context. And sometimes when it’s just a Chain Rule factor at a given layer, the numerator contains something other than L or J. That’s part of the problem: there is some built-in ambiguity in his notational conventions. He then refers to it as “the gradient of foo”, but that’s really not quite right either: it’s the gradient of J (or whatever) w.r.t. foo.

For convenience, he also makes a few other shortcuts here. E.g. in the notation we use here the gradient of an object has the same shape as the object, which makes the parameter update process simpler. If you really go “full math”, it turns out that the gradient ends up being the shape of the transpose of the base object. So I salute your desire to understand in more detail what is going on here, but these courses are specifically designed not to require even univariate calculus as a prerequisite. So there’s no way he can show the “full math” in this context. The math here is stuff that most people haven’t seen unless you were a math or physics or EE major. Here’s a thread which has pointers to derivations and info about matrix calculus if you want to go deeper.

Topic		Replies	Views
Derivation of Backpropagation in RNNs Sequence Models week-1	4	111	May 26, 2024
Week 1 Assignment 1 back-propagation formulas correction Sequence Models	3	670	February 15, 2022
Week 3 - Last video titled: 'Gradient descent and back propagation Calculus for Machine Learning and Data Science week-3	3	22	October 11, 2024
C5W1A1 LSTM gates gradients WRONG? Sequence Models	3	600	November 3, 2022
Derivative of Z1 Neural Networks and Deep Learning week-4	9	258	February 24, 2025

Explanation for derived gradients for LSTM back-prop?

Related topics