DERIVATIVES OR J or L?

Carlitos28 · February 22, 2025, 5:29pm

Hello everybody,
I would like you to help me with this doubt.
When professor write dZ, dW and db , they refer to the derivative of the cost function (J) or the derivative of the loss function (L)?
I ask this because when i did the partial derivative of J respect to Z the result is (1/m)*(A - Y) which is different with dZ = A - Y.
I appreciate your help

TMosh · February 22, 2025, 5:45pm

I think the difference is that “L” is for one example, and “J” is the sum over all of the examples.

Since you want to minimize the cost, in practice we use J. The only difference is the constant 1/m factor.

Carlitos28 · February 22, 2025, 6:28pm

The difference is exactly what you said,
what i want to know is why dZ is A - Y in the video, may be it refers to the derivative of the loss function.

TMosh · February 22, 2025, 6:30pm

The videos are not mathematically perfect or consistent. I would not be overly concerned about it.

F0ngTr4n11 · February 22, 2025, 7:05pm

derivative of J respect to A: (dL/dA) = (1/m) * (-y/A+(1-y)/(1-A))
derivative of J respect to Z: (dL/dZ) = (dA/dZ) * (dL/dA)
= A*(1-A) * (1/m)*(-y/A+(1-y)/(1-A))
= A - y
Note: This is for sigmoid function: A = sigmoid(Z)

F0ngTr4n11 · February 22, 2025, 7:06pm

I forgot to put (1/m) into this

F0ngTr4n11 · February 22, 2025, 7:08pm

[quote=“F0ngTr4n11, post:5, topic:774795”]
Note: This is for sigmoid function: A = sigmoid(Z)
[/quot

and the Cost Function:
J = (1/m) * (-Y*log(A) - (1-Y)*log(1-A))

paulinpaloalto · February 22, 2025, 8:03pm

Yes, dZ is the derivative of L. As Tom says, Prof Ng is not consistent in his use of the “d” prefix to indicate a gradient. But here’s the way to tell:

The only cases in which the gradients are derivatives of J are the dW and db values. Those are the gradients we actually apply to update the parameters W^{[l]} and b^{[l]}.

All other gradients we see in the formulas are just Chain Rule factors that are used to calculate dW and db, so they are usually derivatives of L or some other intermediate value. The other way you can tell is if they are vectors or arrays: in that case you’re looking at a derivative of L.

This has been discussed many times before, e.g. here and here and here.

Topic		Replies	Views
W3_Vectorization of dZ[2] equations Neural Networks and Deep Learning	5	558	March 31, 2023
Optional video explaining backpropagation of C1 : dL/dZ[2] = A[2]- y? Neural Networks and Deep Learning	4	500	August 18, 2023
Derivative of a (da/dz) in Deep Learning Course 1 Neural Networks and Deep Learning	2	527	January 29, 2023
Derivation of formula for dZ[2] Neural Networks and Deep Learning	2	591	May 19, 2023
Derivative of dW and db Neural Networks and Deep Learning week-3	2	247	May 22, 2024

DERIVATIVES OR J or L?

Related topics