C2_W2_Computation graph (Optional)

Sai_Shodhan_Rao · March 4, 2023, 1:00pm

Hello,

How is the derivative of J = (1/2)d^2 will be 2. It should be 1 since 1/2 will be multiplied by 2. Attaching an image for your reference.

AbdElRhaman_Fakhry · March 4, 2023, 6:48pm

Hi @Sai_Shodhan_Rao
Welcome to the community!

You are right that would be explain in the optional lab like image below, but I think that the prof said that to show how would be the change if we change the variable d only without the half factor

Best Regards,
Abdelrahman

Sai_Shodhan_Rao · March 5, 2023, 3:16am

Thank you for the response @AbdElRhaman_Fakhry.

So, ideally, dJ doesn’t change at all irrespective of d since it will always be divided by 2. dJ changes at the same rate as d. Then what is the use of having its derivative here since it changes exactly like the input?

Also, I would like to know how these derivatives(or backpropagation) are helping to do the entire math in n+p computations instead of n*p as in the case of forward propagation.

AbdElRhaman_Fakhry · March 6, 2023, 6:37pm

HI @Sai_Shodhan_Rao
We do that to be more general ovel all cost function because there are many types of cost function and every type the dj change according to the equation of each type.

IF you doing (partial drivative)gradient gescent form right to left this mean that you doingpartial derivative calculation for each parameter at his own and you didn’t benefit from the chain rule or past calculations in the next calculations this is inefficient way and if you have N nodels and P parameters the number of calculation you want to compute to update parameters is N * P but if you doing gradient descent from back(right to lift) the cost function to the parameters, in the other words you used the chain rule benefits(past calulation in next calculations) the the number of calculation you want to compute to update parameters is N + P, and that is efficient way like the image above

Best Regards,
Abdelrahman

Sai_Shodhan_Rao · March 7, 2023, 3:34am

@AbdElRhaman_Fakhry,

Got it. Thank you for the explanation.!!

James_Harmon · March 16, 2023, 10:59pm

I was confused by this as well and assumed that the derivative of J w.r.t d was 1, but this is not correct. The derivative of J(d) = (1/2)d^2 w.r.t d is J’(d) = d. Now here’s the part that I missed, the rate of change depends on the value of d. We need to plug the actual value of d into J’(d) = d. So in the example that you provided, when d = 2, the rate of change is 2. So that is how the prof came to the value 2.

Topic		Replies	Views
Backprop derivatives Advanced Learning Algorithms week-2	1	476	May 13, 2023
Week 3 update_parameters, how to compute partial derivative J Neural Networks and Deep Learning coursera-platform	1	710	July 5, 2021
Backprop derivation problem Advanced Learning Algorithms week-2	7	438	June 24, 2023
Optional video explaining backpropagation of C1 : dL/dZ[2] = A[2]- y? Neural Networks and Deep Learning coursera-platform	4	501	August 18, 2023
Unable to understand Gradient descent intuition Supervised ML: Regression and Classification week-1	4	43	February 8, 2025

C2_W2_Computation graph (Optional)

Related topics