Sorry, but this just says that you are interpreting the notation differently than Prof Ng is. He’s the boss, so he gets to define that:

dZ^{[2]} = \displaystyle \frac {\partial L}{\partial Z^{[2]}}

So it is a vector quantity that has not yet been averaged over the samples. That only happens when he computes the dW and db values. Those are the only ones that are w.r.t. J, as opposed to something else. All the other quantities are just Chain Rule factors that are used in computing the dW and db gradients.