Assignment Building NN C1 Week 4

I managed to submit successfully both programming assignment in Course 1 / Week 4, but I’m still struggling to understand why it works and reconciling with Andrew’s slides “Backpropagation intuition” Slide “summary of gradient descent”

Two questions:

  1. in Andrew’s slide we start by computing dZ[2]=A[2]-Y and we don’t use dA[2]
    In the “two_layer_model” function we start by calculating dA2 and don’t calculate dZ[2]

Why?

  1. in the first programming assignment, paragraph 6.1 equation 10 gives the formula for dA[l-1]. Where is this formula coming from? I can’t find it demonstrated in Andrew’s slides. (maybe I missed it).

  2. Overall: I don’t find the algorithm in the programing assignment following the 6 equations from Andrew’s slide. (We calculate dA_prev, which is not used in the equations for example) and Why is that?

I feel like although I have 100% on the assignment, my understanding of it is superficial, and I don’t understand fully the algorithm and how it applies Andrew’s concepts… I probably missed something

many thanks for your help on all or any of the questions above.

Hi @Nicolas, there is no week 4 in course 1. Please share the correct course and week numbers. You mentioned about slides, so please also share which video and what timestamp is the relevant content that you want to discuss about.

sorry for the mistake. this question is for the deep learning specialization. i posted it in the wrong forum. please ignore.

thanks

1 Like

No problem Nicolas. Closing this.

Hi @Nicolas_Hirel,

I shall move this post to DLS

Hey @Nicolas_Hirel,

Prof Andrew derives this formula in the video entitled “Forward and Backward Propagation” in Week 4 of Course 1.

As to this, I am assuming you are referring to the video entitled “Backpropagation Intuition (Optional)”. In this video, indeed Prof Andrew starts with calculation of dz[2], but you will also find Prof Andrew stating that:

I’m going to skip explicitly computing da. If you want, you can actually compute da^2, and then use that to compute dz^2. But in practice, you could collapse both of these steps into one step.

And this is what Prof Andrew has done in this video, I am assuming for simplicity purposes. But below, you can find the complete derivation, if this is something that you are looking for. Also, I have attached the network just for our reference.

Here, a^{[2]} forms the predictions for this particular model, i.e., \hat{y}, so;

L(a^{[2]}, y) = -y * log(a^{[2]}) - (1 - y) * log(1- a^{[2]}) \\ da^{[2]} = \dfrac{\partial L}{\partial a^{[2]}} = - \frac{y}{a^{[2]}} + \frac{1-y}{1-a^{[2]}} \\ da^{[2]} = \frac{a^{[2]} - y}{a^{[2]} (1 - a^{[2]})}

Now, moving ahead;

dz^{[2]} = \dfrac{\partial L}{\partial a^{[2]}} * \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = da^{[2]} * \dfrac{\partial a^{[2]}}{\partial z^{[2]}} \\ \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = \frac{e^{-z^{[2]}}}{(1 + e^{-z{[2]}})^2} = \frac{1}{(1 + e^{-z{[2]}})} * \frac{e^{-z^{[2]}}}{(1 + e^{-z{[2]}})} \\ \dfrac{\partial a^{[2]}}{\partial z^{[2]}} = \sigma({z^{[2]}}) * (1 - \sigma({z^{[2]}})) = a^{[2]} * (1 - a^{[2]}) \\ dz^{[2]} = da^{[2]} * a^{[2]} * (1 - a^{[2]}) \\ dz^{[2]} = (a^{[2]} - y)

In this derivation, I am assuming that you are comfortable with the basic calculus, one of the probable reasons for which Prof. Andrew skipped this derivation. Let me know if this helps.

Cheers,
Elemento

Elemento, many thanks for your very precise answer. this is indeed extremely useful.

Would you by any chance know how to demonstrate dA[l-1]=W[l]T.dZ[l] ?
As mentionned by Mentor Mubsi (thansk Mubsi!) Prof Andrew mentions this equation in Week 4 / Forward and Backward propagation 1st slide but chose not to explain it into more details (for the sake of simplicity). I’ve tried to calculate it myself but couldn’t. if not please ignore my question (just curious about it).

many thanks!

Hey @Nicolas_Hirel,
Please find the derivation below, and do ignore my hand-writing. I was a little bit lazy to type this in latex :joy:, that’s way resorted to the traditional method of conveying information.

I hope this helps.

Cheers,
Elemento

Elemento, thank you SO MUCH for those great demonstrations and for taking the time to write them and send them to me! (I had been struggling with them for a long time !). it is super clear and very smart.
(and, your hand writting is so much better & clearer than mine… :-))
thank you so much
Nicolas

I am glad I could help. Happy Learning :blush:

Cheers,
Elemento

1 Like