Yes, you have it “upside down”. When he says dfoo, what he means is \displaystyle \frac {\partial L}{\partial foo} or perhaps \displaystyle \frac {\partial J}{\partial foo} depending on the context. And sometimes when it’s just a Chain Rule factor at a given layer, the numerator contains something other than L or J. That’s part of the problem: there is some built-in ambiguity in his notational conventions. He then refers to it as “the gradient of *foo*”, but that’s really not quite right either: it’s the gradient of J (or whatever) w.r.t. *foo*.

For convenience, he also makes a few other shortcuts here. E.g. in the notation we use here the gradient of an object has the same shape as the object, which makes the parameter update process simpler. If you really go “full math”, it turns out that the gradient ends up being the shape of the transpose of the base object. So I salute your desire to understand in more detail what is going on here, but these courses are specifically designed not to require even univariate calculus as a prerequisite. So there’s no way he can show the “full math” in this context. The math here is stuff that most people haven’t seen unless you were a math or physics or EE major. Here’s a thread which has pointers to derivations and info about matrix calculus if you want to go deeper.