in week 4 Andrew computes the derivative of loss with respect to the last layer output(A[L]) but his formula in vectorized implementation has no 1/m. does anyone know why he did not write 1/m in the derivative?

Hi @HamedGholami; welcome to the DLS specialization. Are you referring to a video lecture, a notebook assignment, or both?

Hi @kenb; thank you for the kind words.

I’m referring to the 6th lecture and you can see dA in 8:12 written in the bottom right corner.

I understood it. it is because for every training example we are calculating derivatives separately.

Yes, exactly! When you see L, that is the vector valued loss function with one value per sample. The average doesn’t come into the picture until you start taking derivatives of J, which is the average of L across the samples. The dAL value is just one of the Chain Rule factors you need to compute the actual gradients of J w.r.t. the various parameters.

