Intuition for why we calculate dz?

Sua · April 5, 2022, 5:08pm

In week 4 professor Ng describes the building blocks for a single training iteration for forward pass and backward pass. I understand why we should calculate dw and db: their rates of change eventually affect the result cost function (in the next iteration, through the update of W and B).
I don’t however understand why it’s necessary to calculate the derivative of Z? Also, why don’t we bother to calculate the derivative of A instead?

kenb · April 5, 2022, 6:00pm

Hi, @Sua. The short answer in that it (taking thee derivative of Z is a necessary step in computing the the derivative of A through the chain rule of calculus . Recall that A is a composite function: A^{l} = g\left(Z^{l}\right) where Z^{l} = WA^{l-1} + b, where g\left(\cdot\right) is the activation function. We can write this more generally (for any given layer) as A = g\left(Z(W, b)\right). Using the chain rule:

\frac{dA}{dW} = \frac{dg}{dZ} \frac{dZ}{dW} and \frac{dA}{db} = \frac{dg}{dZ} \frac{dZ}{db } .

Sua · April 5, 2022, 7:15pm

okay, so I get that part, but I think then the fundamental question stands: why do we bother taking the derivative of A?
Like, I try explaining it to myself but I can’t finish the sentence: “Knowing the rate of change of the Activation layer allows us to…”
It seems to me that if we have the weights and bias, and we know how those affect the loss, that’s all we need, right?

paulinpaloalto · April 6, 2022, 4:01am

But it all goes through the activation function at every layer, right? We are composing functions and then using the Chain Rule to compute the gradients (derivatives). You need to take Ken’s point and apply it at the level of the cost. If we want the derivative of the cost J w.r.t. some parameter, we need the derivative of every function between that parameter and the cost, right?

Topic		Replies	Views
week-4-Backpropagation Neural Networks and Deep Learning week-4 , coursera-platform	8	28	November 16, 2024
W3_Vectorization of dZ[2] equations Neural Networks and Deep Learning coursera-platform	5	559	March 31, 2023
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning coursera-platform	4	789	May 27, 2023
Week 3, "Gradient Descent for Neural Networks" Neural Networks and Deep Learning week-3 , coursera-platform	10	472	March 25, 2024
Back propagation why do we start from dZ2 and why transpose Neural Networks and Deep Learning week-3 , coursera-platform	2	332	May 30, 2024

Intuition for why we calculate dz?

Related topics