Backprop derivative after output later - Course 1: Week 3

Kadri_Mufti · July 8, 2022, 10:36pm

In week 3 of the first course, in the lecture videos on gradient descent and backpropagation intuition, it is not explained how the equation for “dz[1]” is arrived at:

Can someone provide an explanation or what the full chain rule equation would look like?

Thank you

anon57530071 · July 9, 2022, 5:15am

Welcome to the community.

If Andrew’s title includes ‘intuition’, it scares me…

Anyway, try to fill the gap.

Here is an overview of the network.

The focus of the equation marked in blue is dz^{[l-1]} in the left neuron given dz^{[l]} in the right neuron. And, we always need to be aware that everything comes from a loss function (cost function) at the right most portion and “back-propagated”.

Let’s start from dz^{[l-1]}. As you know, dz is a shortened form of \frac{\partial\mathcal{L}}{\partial z}. At first, using a chain rule, we separate this into two partial derivatives.

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \end{align}

To calculate the first term, we also use a chain rule in here. And, we also need to be aware that input to this neuron comes from multiple neurons with a different weight for each like this.

In here, z^{[l]} can be written as follows.

\begin{align} z^{[l]} &= w^{[l]}a^{[l-1]} + b^{[l]} \\ &= \sum_{i=1}^{n}w_i^{[l]}a_i^{[l-1]} + b^{[l]} \end{align}

Now, we are ready to calculate \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} with a chain rule.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i \end{align}

We can also rewrite this by using a “dot product”. But, to use a dot product, we need to transpose either. As we want to keep dz in an original form, let’s transpose w.
Now, the last equation can be re-written as follows.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i = {w^{[l]}}^Tdz^{[l]} \end{align}

Then, let’s start the 2nd term, \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}}, which is relatively simple.
As we have a simple equation of a^{[l-1]} = g^{[l-1]}(z^{[l-1]}), it can be calculated as follows.

\begin{align} \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} &= g^{[l-1]'}(z^{[l-1]}) \\ \end{align}

Now, we can put all together…

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \\ \\ &= {w^{[l]}}^Tdz^{[l]}*g^{[l-1]'}(z^{[l-1]}) \end{align}

Then, you just need to put l=2. Now, you get.

dz^{[1]}= {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]})

Hope this helps.

sikisaif · June 21, 2024, 8:49pm

I see a [Math Processing Error] right in the places where it gets interesting. Can someone fix this please? Without that it is hard to follow. Thanks!

rmwkwok · June 22, 2024, 1:06am

Hello, @sikisaif,

I believe it is related to your browser. Maybe you can try a different browser or open it from a different device like your phone?

In any case, I will send you some screenshots.

Cheers,
Raymond

TMosh · June 22, 2024, 3:59am

That is a LaTeX processing error in your browser. Clear your cache and cookies, disable your ad blocker. Sometimes that helps.

Topic		Replies	Views
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning	5	608	February 9, 2023
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning	4	784	May 27, 2023
WK3 Backpropagation intuition formula demonstration Neural Networks and Deep Learning	4	556	June 27, 2022
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning	21	5124	April 27, 2022
Week 3 derivative formulas Neural Networks and Deep Learning	6	538	February 16, 2024

Backprop derivative after output later - Course 1: Week 3

Related topics