Backprop derivative after output later - Course 1: Week 3

anon57530071 · July 9, 2022, 5:15am

Welcome to the community.

If Andrew’s title includes ‘intuition’, it scares me…

Anyway, try to fill the gap.

Here is an overview of the network.

The focus of the equation marked in blue is dz^{[l-1]} in the left neuron given dz^{[l]} in the right neuron. And, we always need to be aware that everything comes from a loss function (cost function) at the right most portion and “back-propagated”.

Let’s start from dz^{[l-1]}. As you know, dz is a shortened form of \frac{\partial\mathcal{L}}{\partial z}. At first, using a chain rule, we separate this into two partial derivatives.

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \end{align}

To calculate the first term, we also use a chain rule in here. And, we also need to be aware that input to this neuron comes from multiple neurons with a different weight for each like this.

In here, z^{[l]} can be written as follows.

\begin{align} z^{[l]} &= w^{[l]}a^{[l-1]} + b^{[l]} \\ &= \sum_{i=1}^{n}w_i^{[l]}a_i^{[l-1]} + b^{[l]} \end{align}

Now, we are ready to calculate \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} with a chain rule.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i \end{align}

We can also rewrite this by using a “dot product”. But, to use a dot product, we need to transpose either. As we want to keep dz in an original form, let’s transpose w.
Now, the last equation can be re-written as follows.

\begin{align} \frac{\partial \mathcal{L}}{\partial a^{[l-1]}} &= \frac{\partial \mathcal{L}}{\partial z^{[l]} }\frac{\partial z^{[l]}}{\partial a^{[l-1]}} =\sum_{i=1}^{n}dz^{[l]}_{i}w^{[l]}_i = {w^{[l]}}^Tdz^{[l]} \end{align}

Then, let’s start the 2nd term, \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}}, which is relatively simple.
As we have a simple equation of a^{[l-1]} = g^{[l-1]}(z^{[l-1]}), it can be calculated as follows.

\begin{align} \frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} &= g^{[l-1]'}(z^{[l-1]}) \\ \end{align}

Now, we can put all together…

\begin{align} dz^{[l-1]} &= \frac{\partial \mathcal{L}}{\partial z^{[l-1]}} \\ &= \frac{\partial \mathcal{L}}{\partial a^{[l-1]}}\frac{\partial a^{[l-1]}}{\partial z^{[l-1]}} \\ \\ &= {w^{[l]}}^Tdz^{[l]}*g^{[l-1]'}(z^{[l-1]}) \end{align}

Then, you just need to put l=2. Now, you get.

dz^{[1]}= {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]})

Hope this helps.

Topic		Replies	Views
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning	5	598	February 9, 2023
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning	4	769	May 27, 2023
WK3 Backpropagation intuition formula demonstration Neural Networks and Deep Learning	4	555	June 27, 2022
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning	21	4758	April 27, 2022
Week 3 derivative formulas Neural Networks and Deep Learning	6	538	February 16, 2024

Backprop derivative after output later - Course 1: Week 3

Related topics