Here is the video I watched: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/6dDj7/backpropagation-intuition-optional
I know calculus and totally understand how to compute the gradients of dL/dz^{2}, dL/dW^{2}, and dL/db^{2}, which have shapes R^1x1, R^1xn^{1}, R^1x1 respectively (with n^{i} is the number of neurons in layer i and I use the character d for partial derivative in this case). But things complicated when I go further and cannot figure out the result, could anyone help me please. I got some results but it’s not fit the right answer (the picture I put here):
Here are some of my efforts:
dL/da^{1} = dL/dz^{2} . dz^{2}/da^{1} = (a^{2} - y).W^{2}, shape 1xn^{1}, with W^2 is a matrix has w_i^{2}T vectors as rows.
dL/dz^{1} = dL/da^{1} . da^{1}/dz^{1} = (a^{2} - y).W^{2} . g’^{1}(z^{1})
dL/dW^{1} = dL/dz^{1} . dz^{1}/dW^{1}, here I don’t know what to do, I figure out that dz^{1}/dW^{1} has shape n^{1}x(n^{1}xn^{0}), and it is:
dz^{1}/dW^{1} = [dz_i^{1}/dW^{1}] stack vertically, with
dz_i^{1}/dW^{1} = [0^T … x^T … 0^T] has shape 1x(n^{1}xn^{0})
I know it’s quite intensive to read but I hope you understand the idea. Thank you everyone.
Check this YouTube guide of Eddy Shyu and this chain rule.
1 Like
I watched the videos but my problem is more complex. Just for a short question, could you help me explain why dz^1 = W^2T.dz^2 * g’^1(z^1), the 4th equation in the picture I showed here? I think it should be:
(a^2 - y).W^2 * g’^1(z^1)
Hello, @prhrurcr09,
Firstly, if you check the shapes, you will find that this multiplication “(a^2 - y).W^2” can’t be carried out, because the last dimension of “(a^2 - y)” is the sample dimension, but the first dimension of “W^2” isn’t.
Besides, if you want to derive it, we should note that the chain rule may not work in the same way when it comes to Matrices. In fact, it would be hardly believed that there should ever be a transpose sign in the result, wouldn’t it?
Below is a previous draft of mine deriving one of the formulae in this reading item in C1 W4.
As you can see, I first changed my focus from Matrices to their elements, and then apply the chain rule on the elements, and then go back to Matrices which I found the transpose sign in its right place.
If you want to derive, I hope this will give you an idea!
Cheers,
Raymond
1 Like
Thank you, I think I got the results, is that ok if I share my works for everyone.
Hi @prhrurcr09, the derivations of the formulae? I think that’s okay!
Cheers
1 Like