Where does it say that Prof Ng is required to be consistent in his notation? Those d values are just shorthands. It turns out that:
dW^{[l]} = \displaystyle \frac {\partial J}{\partial W^{[l]}}
You just have to understand the context to see why the formulas turn out the way that they do.
Keep in mind that literally the only dX values that are partial derivatives of J are the dW^{[l]} and db^{[l]} gradients. Literally every other value is a partial derivative of something different than J.
You have to think through how the Chain Rule applies when you compute the gradients of the W or b values at one of the inner layers of the network. The L to J transition is always there, but it’s literally the last step, right? You don’t want to end up with multiple factors of \frac {1}{m} …