Hello here is my question from the original discussion board:

This looks like just a notation issue. That is to say: what do you think Prof Ng means by dtanh? And what does he mean by db_a? Also note that weâ€™re doing partial derivatives here, not univariate ones.

Thanks for the reply!

So are you saying that what he means by what he wrote is what I wrote more verbosely?

Iâ€™m a bit confused because I thought he said that notation is shorthand for derivative w.r.t. the loss not the output (i.e. dtanh := dtanh/dL). Am I mistaken?

Yes, you have it â€śupside downâ€ť. When he says dfoo, what he means is \displaystyle \frac {\partial L}{\partial foo} or perhaps \displaystyle \frac {\partial J}{\partial foo} depending on the context. And sometimes when itâ€™s just a Chain Rule factor at a given layer, the numerator contains something other than L or J. Thatâ€™s part of the problem: there is some built-in ambiguity in his notational conventions. He then refers to it as â€śthe gradient of *foo*â€ť, but thatâ€™s really not quite right either: itâ€™s the gradient of J (or whatever) w.r.t. *foo*.

For convenience, he also makes a few other shortcuts here. E.g. in the notation we use here the gradient of an object has the same shape as the object, which makes the parameter update process simpler. If you really go â€śfull mathâ€ť, it turns out that the gradient ends up being the shape of the transpose of the base object. So I salute your desire to understand in more detail what is going on here, but these courses are specifically designed not to require even univariate calculus as a prerequisite. So thereâ€™s no way he can show the â€śfull mathâ€ť in this context. The math here is stuff that most people havenâ€™t seen unless you were a math or physics or EE major. Hereâ€™s a thread which has pointers to derivations and info about matrix calculus if you want to go deeper.