Week 3,4: Why isn't 1/m part of dz^[L]?

Sorry, but this is not a typo. The reason you think that is that Prof Ng’s notation is slightly ambiguous. You need to keep track of what the “numerator” is on the partial derivative term. Note that:

dA = \displaystyle \frac {\partial L}{\partial A}

But for dW and db the derivatives are of the scalar cost J:

dW = \displaystyle \frac {\partial J}{\partial W}

Of course J is the average of the vector quantity L over the samples, so that’s where the factor of \displaystyle \frac {1}{m} comes in.

The way Prof Ng structures everything here, it is only the “final” gradients that we actually are going to apply that are derivatives of J. All the rest are just Chain Rule factors. The only “final” gradients are those of W^{[l]} and b^{[l]}.

Here’s another thread that discusses these issues.