Week 3, "Gradient Descent for Neural Networks"

rmwkwok · March 21, 2024, 12:43pm

The idea is just chain-rule.

According to the left, you know how the cost depends on Z^[1]

then we can name the relevant derivatives:

However, we can’t just multiply them together because the chain-rule for matrices are not like the chain-rule for scalars that we have learnt in high school, but that does not stop us from finding out what each of those derivatives are:

So we basically get all of those terms needed for that final formula.

The final formula in the slide tells us

the correct order of those terms,
the need for transposing W^[2], and
the element-wise multiplication operator

as a result of chain-rule involving matrices.

If you have time, go through this post for an example of how matrix-based chain-rule is different from the usual scalar-based chain-rule. In fact, you will see why W has to be transposed and switch position with dZ. You can also use the same idea to prove that final formula but it is going to take some time ;).

Cheers,
Raymond

PS: You can add a backslash between ^ and [ when you type Z^[1] so that it can be displayed correctly. I corrected your post for you.

Topic		Replies	Views
Clarification grad. descent Neural Networks and Deep Learning	2	539	May 25, 2021
Week 3 - Please explain how we got to this backward propagation result? Neural Networks and Deep Learning	6	720	February 12, 2023
W2_A1_Calculating gradient descent with variables Dw and db Neural Networks and Deep Learning	5	1018	December 8, 2023
Back propagation why do we start from dZ2 and why transpose Neural Networks and Deep Learning week-3	2	328	May 30, 2024
Week 3: computing derivatives for shallow network Neural Networks and Deep Learning	2	681	January 26, 2022

Week 3, "Gradient Descent for Neural Networks"

Related topics