Backpropagation week 3 vs week 4

Andre_Ramirez · August 1, 2022, 11:07pm

Hello everyone,

In the programming assignment for week 3 we use this formula for backprop 𝑑𝑍[𝑙]=𝑑𝐴[𝑙]∗𝑔′(𝑍[𝑙]), however for week 4 we use this one dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)).

I know the first one is the derivative with respect Z and the second one respect with A, but my confusion is why in week 3 we didn’t use dAL but only dZ when programming backprop?

anon57530071 · August 2, 2022, 1:00am

Welcome to the community.

Let’s start from the week 4 assignment, which is “L-Layer model”. Here is an overview.

As there are multiple hidden layer, “repeat” functions like forward-propagation and back-propagation targets on one hidden layer to be repeated. In the case of back-prop, the input (from the upper layer) to the layer l is da^{[l]}, and output (to the lower layer) is da^{[l-1]}.

On the other hand, in the case of an assignment in the week 3, it only has one hidden layer. Here is an overview.

Since there is only one hidden layer, in this exercise, we calculate back-propagation for both layers at once. (You see that, in both forward-propagation and back-propagation, several derivatives for both layers like dW1, dW2, dZ1, dZ2 ,… are calculated.) In addition, with using a chain rule, calculation of da is bypassed. (See dz^{[1]} = {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]}))

The above is the reason why dAL did not appear in the Week 3 exercise.

Andre_Ramirez · August 2, 2022, 11:36pm

Hi Nobu_Asai. Thanks for the explication, it makes sense to me. Basically, it has to do with the number of layers.

I take the opportunity to make the following question:

We need AL to initialize backprop and Prf Andrew mentions that when using logistic regression or binomial classification the derivative is - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)).

So, my question is, what would be the derivative if my activation function for the last layer(output) gL(ZL) is not sigmoid, but TanH or Relu for example? Is there any sheet or blog that indicates which is the derivative to use to initialize backprop in case the activation function is not sigmoid?

P.S. Sorry for putting the formulas in “plain” format, but I didn’t know how to write them in “formula” syntax

anon57530071 · August 3, 2022, 2:09am

I think this provides answers to your questions. Another one is also helpful if you want to start different classifications.

Hope this helps.

By the way, you can use “$” to start LaTex and use “$” again to end LaTex to insert a formula. But, it is not necessary in most of cases, if there is no very complex subscription/superscription/accent (like tilde, hat,…) and so on.

Andre_Ramirez · August 5, 2022, 2:31pm

Thanks Nobu_Asai. I appreciate your help.

Topic		Replies	Views
Exercise 6 - backward_propagation in Programming Assignment Week 3 Neural Networks and Deep Learning	8	693	October 27, 2022
Assignment Building NN C1 Week 4 Neural Networks and Deep Learning	11	621	August 16, 2022
week-4-Backpropagation Neural Networks and Deep Learning week-4	8	24	November 16, 2024
Week 4 backward propagation da[l-1] derivation Neural Networks and Deep Learning	2	833	July 24, 2021
BackPropagation Derivation Of 2 Layer Neural Network Neural Networks and Deep Learning week-3	1	242	March 3, 2024

Backpropagation week 3 vs week 4

Related topics