Assignment Building NN C1 Week 4

Nicolas_Hirel · August 16, 2022, 10:09am

I managed to submit successfully both programming assignment in Course 1 / Week 4, but I’m still struggling to understand why it works and reconciling with Andrew’s slides “Backpropagation intuition” Slide “summary of gradient descent”

Two questions:

in Andrew’s slide we start by computing dZ[2]=A[2]-Y and we don’t use dA[2]
In the “two_layer_model” function we start by calculating dA2 and don’t calculate dZ[2]

Why?

in the first programming assignment, paragraph 6.1 equation 10 gives the formula for dA[l-1]. Where is this formula coming from? I can’t find it demonstrated in Andrew’s slides. (maybe I missed it).
Overall: I don’t find the algorithm in the programing assignment following the 6 equations from Andrew’s slide. (We calculate dA_prev, which is not used in the equations for example) and Why is that?

I feel like although I have 100% on the assignment, my understanding of it is superficial, and I don’t understand fully the algorithm and how it applies Andrew’s concepts… I probably missed something

many thanks for your help on all or any of the questions above.

rmwkwok · August 16, 2022, 10:33am

Hi @Nicolas, there is no week 4 in course 1. Please share the correct course and week numbers. You mentioned about slides, so please also share which video and what timestamp is the relevant content that you want to discuss about.

Nicolas_Hirel · August 16, 2022, 10:47am

sorry for the mistake. this question is for the deep learning specialization. i posted it in the wrong forum. please ignore.

thanks

rmwkwok · August 16, 2022, 10:48am

No problem Nicolas. Closing this.

Mubsi · August 16, 2022, 11:22am

Hi @Nicolas_Hirel,

I shall move this post to DLS

Elemento · August 16, 2022, 2:14pm

Hey @Nicolas_Hirel,

Prof Andrew derives this formula in the video entitled “Forward and Backward Propagation” in Week 4 of Course 1.

As to this, I am assuming you are referring to the video entitled “Backpropagation Intuition (Optional)”. In this video, indeed Prof Andrew starts with calculation of dz[2], but you will also find Prof Andrew stating that:

I’m going to skip explicitly computing da. If you want, you can actually compute da^2, and then use that to compute dz^2. But in practice, you could collapse both of these steps into one step.

And this is what Prof Andrew has done in this video, I am assuming for simplicity purposes. But below, you can find the complete derivation, if this is something that you are looking for. Also, I have attached the network just for our reference.

Here, a^{[2]} forms the predictions for this particular model, i.e., \hat{y}, so;

L(a^{[2]}, y) = -y * log(a^{[2]}) - (1 - y) * log(1- a^{[2]}) \\ da^{[2]} = \dfrac{\partial L}{\partial a^{[2]}} = - \frac{y}{a^{[2]}} + \frac{1-y}{1-a^{[2]}} \\ da^{[2]} = \frac{a^{[2]} - y}{a^{[2]} (1 - a^{[2]})}

Now, moving ahead;

dz^{[2]} = \dfrac{\partial L}{\partial a^{[2]}} * \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = da^{[2]} * \dfrac{\partial a^{[2]}}{\partial z^{[2]}} \\ \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = \frac{e^{-z^{[2]}}}{(1 + e^{-z{[2]}})^2} = \frac{1}{(1 + e^{-z{[2]}})} * \frac{e^{-z^{[2]}}}{(1 + e^{-z{[2]}})} \\ \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = \sigma({z^{[2]}}) * (1 - \sigma({z^{[2]}})) = a^{[2]} * (1 - a^{[2]}) \\ dz^{[2]} = da^{[2]} * a^{[2]} * (1 - a^{[2]}) \\ dz^{[2]} = (a^{[2]} - y)

In this derivation, I am assuming that you are comfortable with the basic calculus, one of the probable reasons for which Prof. Andrew skipped this derivation. Let me know if this helps.

Cheers,
Elemento

Nicolas_Hirel · August 16, 2022, 2:51pm

Elemento, many thanks for your very precise answer. this is indeed extremely useful.

Would you by any chance know how to demonstrate dA[l-1]=W[l]T.dZ[l] ?
As mentionned by Mentor Mubsi (thansk Mubsi!) Prof Andrew mentions this equation in Week 4 / Forward and Backward propagation 1st slide but chose not to explain it into more details (for the sake of simplicity). I’ve tried to calculate it myself but couldn’t. if not please ignore my question (just curious about it).

many thanks!

Elemento · August 16, 2022, 3:34pm

Hey @Nicolas_Hirel,
Please find the derivation below, and do ignore my hand-writing. I was a little bit lazy to type this in latex , that’s way resorted to the traditional method of conveying information.

I hope this helps.

Cheers,
Elemento

Nicolas_Hirel · August 16, 2022, 4:15pm

Elemento, thank you SO MUCH for those great demonstrations and for taking the time to write them and send them to me! (I had been struggling with them for a long time !). it is super clear and very smart.
(and, your hand writting is so much better & clearer than mine… :-))
thank you so much
Nicolas

Elemento · August 16, 2022, 4:40pm

I am glad I could help. Happy Learning

Cheers,
Elemento

Topic		Replies	Views
Programming assignment / Equation (10) Neural Networks and Deep Learning coursera-platform	2	528	August 16, 2022
How to derive dZ_[1] Neural Networks and Deep Learning coursera-platform	3	596	May 29, 2023
Course 1 Week 4 programming assignment #2 error Neural Networks and Deep Learning coursera-platform	9	578	September 30, 2022
Course 1: Week 3 (backpropagation intuition) Neural Networks and Deep Learning coursera-platform	21	5584	April 27, 2022
Week 4 backward propagation da[l-1] derivation Neural Networks and Deep Learning coursera-platform	2	845	July 24, 2021

Assignment Building NN C1 Week 4

Related topics