Derivation of the gradients of W^[2], and b^[2] in the 1 hidden neuron network

prhrurcr09 · March 17, 2025, 9:55am

Here is the video I watched: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/6dDj7/backpropagation-intuition-optional

I know calculus and totally understand how to compute the gradients of dL/dz^{2}, dL/dW^{2}, and dL/db^{2}, which have shapes R^1x1, R^1xn^{1}, R^1x1 respectively (with n^{i} is the number of neurons in layer i and I use the character d for partial derivative in this case). But things complicated when I go further and cannot figure out the result, could anyone help me please. I got some results but it’s not fit the right answer (the picture I put here):

Here are some of my efforts:

dL/da^{1} = dL/dz^{2} . dz^{2}/da^{1} = (a^{2} - y).W^{2}, shape 1xn^{1}, with W^2 is a matrix has w_i^{2}T vectors as rows.

dL/dz^{1} = dL/da^{1} . da^{1}/dz^{1} = (a^{2} - y).W^{2} . g’^{1}(z^{1})

dL/dW^{1} = dL/dz^{1} . dz^{1}/dW^{1}, here I don’t know what to do, I figure out that dz^{1}/dW^{1} has shape n^{1}x(n^{1}xn^{0}), and it is:

dz^{1}/dW^{1} = [dz_i^{1}/dW^{1}] stack vertically, with

dz_i^{1}/dW^{1} = [0^T … x^T … 0^T] has shape 1x(n^{1}xn^{0})

I know it’s quite intensive to read but I hope you understand the idea. Thank you everyone.

saifkhanengr · March 17, 2025, 1:36pm

Check this YouTube guide of Eddy Shyu and this chain rule.

prhrurcr09 · March 18, 2025, 2:23am

I watched the videos but my problem is more complex. Just for a short question, could you help me explain why dz^1 = W^2T.dz^2 * g’^1(z^1), the 4th equation in the picture I showed here? I think it should be:
(a^2 - y).W^2 * g’^1(z^1)

rmwkwok · March 18, 2025, 2:52am

Hello, @prhrurcr09,

Firstly, if you check the shapes, you will find that this multiplication “(a^2 - y).W^2” can’t be carried out, because the last dimension of “(a^2 - y)” is the sample dimension, but the first dimension of “W^2” isn’t.

Besides, if you want to derive it, we should note that the chain rule may not work in the same way when it comes to Matrices. In fact, it would be hardly believed that there should ever be a transpose sign in the result, wouldn’t it?

Below is a previous draft of mine deriving one of the formulae in this reading item in C1 W4.

As you can see, I first changed my focus from Matrices to their elements, and then apply the chain rule on the elements, and then go back to Matrices which I found the transpose sign in its right place.

If you want to derive, I hope this will give you an idea!

Cheers,
Raymond

prhrurcr09 · March 18, 2025, 4:14am

Thank you, I think I got the results, is that ok if I share my works for everyone.

rmwkwok · March 18, 2025, 10:41pm

Hi @prhrurcr09, the derivations of the formulae? I think that’s okay!

Cheers

Topic		Replies	Views
W3_A1_Derivative for hidden neural layers (Backprop) Neural Networks and Deep Learning coursera-platform	5	608	February 9, 2023
Week 3, "Gradient Descent for Neural Networks" Neural Networks and Deep Learning week-module-3 , coursera-platform	10	475	March 25, 2024
W3_A1_Ex-6_What's the link between dz[1] and w[2] equation? Neural Networks and Deep Learning coursera-platform	1	584	October 23, 2022
W2_A2_Calculation of Partial derivatives Neural Networks and Deep Learning coursera-platform	12	1013	July 24, 2023
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning coursera-platform	21	5294	April 27, 2022

Derivation of the gradients of W^[2], and b^[2] in the 1 hidden neuron network

Related topics