C2_W2_Computation graph (Optional)


How is the derivative of J = (1/2)d^2 will be 2. It should be 1 since 1/2 will be multiplied by 2. Attaching an image for your reference.

Hi @Sai_Shodhan_Rao
Welcome to the community!

You are right that would be explain in the optional lab like image below, but I think that the prof said that to show how would be the change if we change the variable d only without the half factor

Best Regards,

Thank you for the response @AbdElRhaman_Fakhry.

So, ideally, dJ doesn’t change at all irrespective of d since it will always be divided by 2. dJ changes at the same rate as d. Then what is the use of having its derivative here since it changes exactly like the input?

Also, I would like to know how these derivatives(or backpropagation) are helping to do the entire math in n+p computations instead of n*p as in the case of forward propagation.

HI @Sai_Shodhan_Rao
We do that to be more general ovel all cost function because there are many types of cost function and every type the dj change according to the equation of each type.

IF you doing (partial drivative)gradient gescent form right to left this mean that you doingpartial derivative calculation for each parameter at his own and you didn’t benefit from the chain rule or past calculations in the next calculations this is inefficient way and if you have N nodels and P parameters the number of calculation you want to compute to update parameters is N * P but if you doing gradient descent from back(right to lift) the cost function to the parameters, in the other words you used the chain rule benefits(past calulation in next calculations) the the number of calculation you want to compute to update parameters is N + P, and that is efficient way like the image above

Best Regards,


Got it. Thank you for the explanation.!!

I was confused by this as well and assumed that the derivative of J w.r.t d was 1, but this is not correct. The derivative of J(d) = (1/2)d^2 w.r.t d is J’(d) = d. Now here’s the part that I missed, the rate of change depends on the value of d. We need to plug the actual value of d into J’(d) = d. So in the example that you provided, when d = 2, the rate of change is 2. So that is how the prof came to the value 2.