W2_A2_Calculation of Partial derivatives

Alexis_V · May 28, 2023, 12:36pm

any one understand why the calculation of the derivatives:
da/dz is a(1-a) ?
and then also how dL/dw1, dL/dw2, dL/db are calculated ?

saifkhanengr · May 28, 2023, 12:41pm

Here is the answer. Also, check this YouTube playlist of Eddy Shyu. Maybe @paulinpaloalto will add some more stuff to read.

Alexis_V · May 28, 2023, 1:21pm

Found the answer for dL/dz in the optional material: Derivation of DL/dz

but feel free to comment on the calculus of dL/dw1, dL/dw2 and dL/db

paulinpaloalto · May 28, 2023, 2:15pm

Here’s another thread with links to material about the derivations of the backprop formulas. It also points to this thread about matrix calculus in general that is helpful. As you’d expect, matrix calculus is based on the same principles as univariate calculus, but things get more complicated with more dimensions beyond even the normal partial derivative notions.

Alexis_V · May 28, 2023, 2:40pm

thanks a lot @paulinpaloalto !

saifkhanengr · May 28, 2023, 3:58pm

Paul sir! Please correct me if I am wrong.

Suppose we have a three-layer model (2 hidden and 1 output). The chain-rule for dZ1, dW1, and db1 are:

\frac{dL}{dZ1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dA1}{dZ1}

\frac{dL}{dW1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{dW1}

\frac{dL}{db1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{db1}

In dW1, we do not take derivative w.r.t. any other weights like W2 or W3, right? Same for b.

Elemento · May 29, 2023, 5:38am

Hey @saifkhanengr,
Although not related to your query, I believe that your first equation has one missing term. According to me, it should be as follows:

\frac{dL}{dZ1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1} \times \frac{dA1}{dZ1}

\frac{dL}{dW1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{dW1}

\frac{dL}{db1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{db1}

And as for your query, you are correct indeed. For computing dW_1, we do not take the derivative wrt any other weights or biases. Same goes for db_1. The reason is simple too.

Consider Z_2 = W_2^T A_1 + b_2. For computing \frac{dL}{dW1}, we need \frac{dZ1}{dW1}, and since from Z1, we get A1, we also need \frac{dA1}{dZ1}. Now, here if we take \frac{dZ2}{dW2} instead of \frac{dZ2}{dA1}, there will be a derivative mismatch, and the back-propagation won’t work. I hope this resolves your issue.

Cheers,
Elemento

saifkhanengr · May 29, 2023, 6:27am

Oh, thanks for catching the missing term, Elemento. And thanks for clarifying my doubts.

Alexis_V · May 29, 2023, 4:43pm

hey thanks @saifkhanengr I understan for da/dz now,
but is there any explanation for dL/dw1, dL/dw2, dL/db ?

saifkhanengr · May 30, 2023, 3:01am

Elemento gives us the correct equations. For the rest equations, you can get sense from them.

paulinpaloalto · May 30, 2023, 4:32am

And if you want to dig deeper you can follow the links that I gave earlier on this thread.

Alexis_V · June 8, 2023, 1:38pm

Hi all,

I finally found an article that is also explaining the derivatives wrt w1, w2 and b
https://vincentblog.xyz/posts/backpropagation-and-gradient-descent

Feel free to comments
Alexis

Alexis_V · July 24, 2023, 8:52pm

Hi all,
I just noticed that in the link https://vincentblog.xyz/posts/backpropagation-and-gradient-descent

the first layer is also using a Sigmoid which is not the case in the course (tanh)

Topic		Replies	Views
Week 3 derivative formulas Neural Networks and Deep Learning coursera-platform	6	538	February 16, 2024
Clarification grad. descent Neural Networks and Deep Learning coursera-platform	2	539	May 25, 2021
Week 3: Why dZ^[1] = W^[2]T dZ^[2] * g^[1]'(Z^[1]) Neural Networks and Deep Learning coursera-platform	3	903	February 13, 2023
BackPropagation Derivation Of 2 Layer Neural Network Neural Networks and Deep Learning week-3 , coursera-platform	1	246	March 3, 2024
Derivation of DL/dz Neural Networks and Deep Learning coursera-platform	20	111647	September 24, 2022

W2_A2_Calculation of Partial derivatives

Related topics