Backprop derivative after output later - Course 1: Week 3

In week 3 of the first course, in the lecture videos on gradient descent and backpropagation intuition, it is not explained how the equation for “dz[1]” is arrived at:


Can someone provide an explanation or what the full chain rule equation would look like?

Thank you

Welcome to the community.

If Andrew’s title includes ‘intuition’, it scares me… :disappointed_relieved:

Anyway, try to fill the gap.

Here is an overview of the network.

The focus of the equation marked in blue is dz^{[l-1]} in the left neuron given dz^{[l]} in the right neuron. And, we always need to be aware that everything comes from a loss function (cost function) at the right most portion and “back-propagated”.

Let’s start from dz^{[l-1]}. As you know, dz is a shortened form of \frac{\partial\mathcal{L}}{\partial z}. At first, using a chain rule, we separate this into two partial derivatives.

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \end{align}

To calculate the first term, we also use a chain rule in here. And, we also need to be aware that input to this neuron comes from multiple neurons with a different weight for each like this.

In here, z^{[l]} can be written as follows.

\begin{align} z^{[l]} &= w^{[l]}a^{[l-1]} + b^{[l]} \\ &= \sum_{i=1}^{n}w_i^{[l]}a_i^{[l-1]} + b^{[l]} \end{align}

Now, we are ready to calculate \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} with a chain rule.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i \end{align}

We can also rewrite this by using a “dot product”. But, to use a dot product, we need to transpose either. As we want to keep dz in an original form, let’s transpose w.
Now, the last equation can be re-written as follows.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i = {w^{[l]}}^Tdz^{[l]} \end{align}

Then, let’s start the 2nd term, \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}}, which is relatively simple.
As we have a simple equation of a^{[l-1]} = g^{[l-1]}(z^{[l-1]}), it can be calculated as follows.

\begin{align} \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} &= g^{[l-1]'}(z^{[l-1]}) \\ \end{align}

Now, we can put all together…

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \\ \\ &= {w^{[l]}}^Tdz^{[l]}*g^{[l-1]'}(z^{[l-1]}) \end{align}

Then, you just need to put l=2. Now, you get.

dz^{[1]}= {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]})

Hope this helps.

1 Like