Backpropagation week 3 vs week 4

Hello everyone,

In the programming assignment for week 3 we use this formula for backprop 𝑑𝑍[𝑙]=𝑑𝐴[𝑙]βˆ—π‘”β€²(𝑍[𝑙]), however for week 4 we use this one dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)).

I know the first one is the derivative with respect Z and the second one respect with A, but my confusion is why in week 3 we didn’t use dAL but only dZ when programming backprop?

1 Like

Welcome to the community.

Let’s start from the week 4 assignment, which is β€œL-Layer model”. Here is an overview.

As there are multiple hidden layer, β€œrepeat” functions like forward-propagation and back-propagation targets on one hidden layer to be repeated. In the case of back-prop, the input (from the upper layer) to the layer l is da^{[l]}, and output (to the lower layer) is da^{[l-1]}.

On the other hand, in the case of an assignment in the week 3, it only has one hidden layer. Here is an overview.

Since there is only one hidden layer, in this exercise, we calculate back-propagation for both layers at once. (You see that, in both forward-propagation and back-propagation, several derivatives for both layers like dW1, dW2, dZ1, dZ2 ,… are calculated.) In addition, with using a chain rule, calculation of da is bypassed. (See dz^{[1]} = {w^{[2]}}^Tdz^{[2]}*g^{[1]'}(z^{[1]}))

The above is the reason why dAL did not appear in the Week 3 exercise.

Hi Nobu_Asai. Thanks for the explication, it makes sense to me. Basically, it has to do with the number of layers.

I take the opportunity to make the following question:

We need AL to initialize backprop and Prf Andrew mentions that when using logistic regression or binomial classification the derivative is - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)).

So, my question is, what would be the derivative if my activation function for the last layer(output) gL(ZL) is not sigmoid, but TanH or Relu for example? Is there any sheet or blog that indicates which is the derivative to use to initialize backprop in case the activation function is not sigmoid?

P.S. Sorry for putting the formulas in β€œplain” format, but I didn’t know how to write them in β€œformula” syntax

I think this provides answers to your questions. Another one is also helpful if you want to start different classifications.

Hope this helps.

By the way, you can use β€œ$” to start LaTex and use β€œ$” again to end LaTex to insert a formula. But, it is not necessary in most of cases, if there is no very complex subscription/superscription/accent (like tilde, hat,…) and so on. :slight_smile:

Thanks Nobu_Asai. I appreciate your help.