Hi

in the optional video explaining backpropagation of C1 of deep learning spec , when we use the whole training set in the X matrix , we should consider the overall cost formula that includes the term 1/m

but when professor calculate dL/dZ[2] = A[2]- y , we dont include 1/m

normaly dL/dZ[2] = 1/m * (A[2]- y ) ?

Still not clear for me ; i m sorry

here is my derivation of DJ/DZ using batch of m training examples

You gave the answer right there: notice that this is the derivative of L, not J. Of course L is a vector quantity with m elements. You don’t get the average until you get to the stage of computing derivatives of J, which is the average of L over the m samples. Literally the only quantities in any of this that are derivatives of J are dW and db. Everything else are just “Chain Rule” factors used to compute dW and db and are not averages.

Thank you