Back propagation derivatives

nicopz · May 29, 2025, 7:14pm

Hi everyone, I have the following doubt. Some of the derivatives, for example dz = a(1-a), were calculated when the activation function chosen was the sigmoid. Now that we are using the tanh, or later we might use the relu. How can we get the correct equations? I hope I’m making my self clear.
Thanks in advance

TMosh · May 29, 2025, 7:35pm

You would have to understand enough calculus to compute the partial derivative of the cost equation using the tanh() activation.

nicopz · May 29, 2025, 7:41pm

The derivative of the sigmoid is not the same as the derivative of the tanh so the ecuation should at least be different

paulinpaloalto · May 30, 2025, 1:06am

Yes, you need to calculate the derivative of each of the activation functions. That is covered in the various assignments where we use something other than sigmoid. You’ll see tanh in the Planar Data assignment in Week 3 and then you’ll see ReLU in the “Step by Step” assignment in week 4. The relevant general equation that involves the activation function at each layer is:

dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})

Of course the derivative g'() depends on what the function g() is.

paulinpaloalto · May 30, 2025, 1:09am

But notice that in all cases here in DLS Course 1, we are doing binary classifications. That means that the activation at the output layer is always sigmoid and that’s the derivative that interacts with the loss function. Here’s a thread showing how all the derivatives play out at the output layer in a binary classification.

nicopz · May 30, 2025, 1:11pm

Thanks for the answer, but backpropagation derivatives go till the beginning of the NN so when we use tanh as an activation function the formula should change right? Also I saw in the optional reading the formula of the derivative when using softmax and that one is different also, how do go generate a NN that takes into account those changes?

paulinpaloalto · May 30, 2025, 2:12pm

Have you gotten to week 4 of DLS Course 1 yet? There Prof Ng shows us the back propagation formulas at each layer and how the activation functions affect both forward and backward propagation. I gave the key formula earlier in this thread which shows the point at which the derivative of the activation function affects the results.

paulinpaloalto · May 30, 2025, 2:31pm

The overall point is that in back propagation, it starts from the output layer and we process each layer one at a time stepping backwards through the layers. For every layer other than the output (last) layer, the input to the computation for the current layer is the output of the back prop calculation for the next later layer. So we apply the formula I showed above at each layer and then it literally “propagates” backwards to the previous layers. So if we use tanh at layer 3 of a 4 layer network, then the derivative of tanh affects the results at layer 3, which then affects the results at layer 2 and layer 1. That’s what they mean by “backward propagation”.

This is all covered in the lectures and in the assignments. If you have not yet gotten through DLS Course 1, I suggest you “hold that thought” and proceed with the course and listen to what Prof Ng explains and then you’ll get to implement it in the assignments. It should all be clear after that. If not, we can discuss more.

Topic		Replies	Views
Missing intution on tanh in back-prop - Programming assignment 2 Neural Networks and Deep Learning week-3 , coursera-platform	5	231	March 1, 2024
Is there a typo in back propagation of Course 1 HW3? Neural Networks and Deep Learning coursera-platform	1	501	March 25, 2022
Assignment 6: Calculation of dZ[1] Neural Networks and Deep Learning week-3 , coursera-platform	3	198	February 29, 2024
When calculating the derivative of dW, why do you add it to dZ * X across all m training sets? Neural Networks and Deep Learning coursera-platform	3	642	December 18, 2022
Week4- assignment 2- Difference in gradient calculation for the last layer activation in neural networks Neural Networks and Deep Learning coursera-platform	2	677	May 17, 2023

Back propagation derivatives

Related topics