Hello my question is about vectorizing backpropagation, for say logistic regression.

We are trying to minimize the Cost function J, which is the average of the losses over all m examples. So the cost function, because of the averaging, has a factor of 1/m. However, when we compute say dz^(1) as in the picture below, why isn’t there a factor of 1/m? i.e. why isn’t dz^(1) equal to (1/m)(a^(1)-y^(1)) ?

In general, since m is a constant, it really doesn’t matter if you include the 1/m in the gradients. It’s just going to re-scale the gradients by a constant amount, and that can be compensated for with a different learning rate.

Hello @Fizay-Noah_Lee,

The 1/m is on the right hand side of the slide:

and it continues to show up in the next slide:

Note that dz is not the final gradient result, instead it is just an intermediate variable that we use to compute the final db and dw. db and dw are the gradients we need for gradient descent and the 1/m is never missed out.

Cheers,

Raymond