in the optional video explaining backpropagation of C1 of deep learning spec , when we use the whole training set in the X matrix , we should consider the overall cost formula that includes the term 1/m
but when professor calculate dL/dZ = A- y , we dont include 1/m
normaly dL/dZ = 1/m * (A- y ) ?
Because then we added that part to dW and db. Chain rule. Please read this.
Still not clear for me ; i m sorry
here is my derivation of DJ/DZ using batch of m training examples
You gave the answer right there: notice that this is the derivative of L, not J. Of course L is a vector quantity with m elements. You don’t get the average until you get to the stage of computing derivatives of J, which is the average of L over the m samples. Literally the only quantities in any of this that are derivatives of J are dW and db. Everything else are just “Chain Rule” factors used to compute dW and db and are not averages.