Hello @Juheon_Chu,
The idea is just chain-rule.
According to the left, you know how the cost depends on Z^[1]
then we can name the relevant derivatives:
However, we can’t just multiply them together because the chain-rule for matrices are not like the chain-rule for scalars that we have learnt in high school, but that does not stop us from finding out what each of those derivatives are:
So we basically get all of those terms needed for that final formula.
The final formula in the slide tells us
- the correct order of those terms,
- the need for transposing W^[2], and
- the element-wise multiplication operator
as a result of chain-rule involving matrices.
If you have time, go through this post for an example of how matrix-based chain-rule is different from the usual scalar-based chain-rule. In fact, you will see why W has to be transposed and switch position with dZ. You can also use the same idea to prove that final formula but it is going to take some time ;).
Cheers,
Raymond
PS: You can add a backslash between ^ and [ when you type Z^[1] so that it can be displayed correctly. I corrected your post for you.