# Backprop derivative after output later - Course 1: Week 3

In week 3 of the first course, in the lecture videos on gradient descent and backpropagation intuition, it is not explained how the equation for “dz[1]” is arrived at:

Can someone provide an explanation or what the full chain rule equation would look like?

Thank you

Welcome to the community.

If Andrew’s title includes ‘intuition’, it scares me…

Anyway, try to fill the gap.

Here is an overview of the network.

The focus of the equation marked in blue is dz^{[l-1]} in the left neuron given dz^{[l]} in the right neuron. And, we always need to be aware that everything comes from a loss function (cost function) at the right most portion and “back-propagated”.

Let’s start from dz^{[l-1]}. As you know, dz is a shortened form of \frac{\partial\mathcal{L}}{\partial z}. At first, using a chain rule, we separate this into two partial derivatives.

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \end{align}

To calculate the first term, we also use a chain rule in here. And, we also need to be aware that input to this neuron comes from multiple neurons with a different weight for each like this.

In here, z^{[l]} can be written as follows.

\begin{align} z^{[l]} &= w^{[l]}a^{[l-1]} + b^{[l]} \\ &= \sum_{i=1}^{n}w_i^{[l]}a_i^{[l-1]} + b^{[l]} \end{align}

Now, we are ready to calculate \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} with a chain rule.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i \end{align}

We can also rewrite this by using a “dot product”. But, to use a dot product, we need to transpose either. As we want to keep dz in an original form, let’s transpose w.
Now, the last equation can be re-written as follows.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i = {w^{[l]}}^Tdz^{[l]} \end{align}

Then, let’s start the 2nd term, \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}}, which is relatively simple.
As we have a simple equation of a^{[l-1]} = g^{[l-1]}(z^{[l-1]}), it can be calculated as follows.

\begin{align} \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} &= g^{[l-1]'}(z^{[l-1]}) \\ \end{align}

Now, we can put all together…

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \\ \\ &= {w^{[l]}}^Tdz^{[l]}*g^{[l-1]'}(z^{[l-1]}) \end{align}

Then, you just need to put l=2. Now, you get.

dz^{[1]}= {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]})

Hope this helps.

1 Like