Week 3: Why dZ^[1] = W^[2]T dZ^[2] * g^[1]'(Z^[1])

rmwkwok · February 13, 2023, 3:22am

As @paulinpaloalto suggested, we often resort to checking the matrices’ indices to make sure all symbols are in order. As he also pointed out, we can’t always do matrix calculus in the way we do scalar calculus. As wikipedia very well summaried:

Note that exact equivalents of the scalar product rule and chain rule do not exist when applied to matrix-valued functions of matrices.

We can do a simple exercise to verify that indeed we can’t use chain rule like that. Let’s say we define matrices Z, W, A and a scalar L this way,

Obviously,

Can our scalar “chain rule” recovers the same result?

The answer is no. But the following will do:

Lastly, in my step (5), I said “this is wrong”, because a matrix-by-matrix derivative like that won’t result in a 2x2 matrix. Because as the same wikipedia page says:

Matrix calculus refers to a number of different notations that use matrices and vectors to collect the derivative of each component of the dependent variable with respect to each component of the independent variable.

So to correct that wrong thing, it is going to be a tensor that organizes 16 derivative results.

I think the wikipedia page has a lot of useful examples if you want to dig deeper.

Cheers,
Raymond

Topic		Replies	Views
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning	4	784	May 27, 2023
How we got derivative of dz[1]=w[2]T.dz[2]*g[1]`(z[1]) Neural Networks and Deep Learning week-3	1	232	May 7, 2024
W2_A1_Calculating gradient descent with variables Dw and db Neural Networks and Deep Learning	5	1023	December 8, 2023
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning	5	608	February 9, 2023
BackPropagation Derivation Of 2 Layer Neural Network Neural Networks and Deep Learning week-3	1	244	March 3, 2024

Week 3: Why dZ^[1] = W^[2]T dZ^[2] * g^[1]'(Z^[1])

Related topics