C1_W4: Confused about da[l-1]

khteh · September 19, 2025, 8:27am

This variable, da[l-1], just pops out of nowhere in week-4 of this course and I am still not clear of why it is needed and how the formula of its calculation come about.

da[l-1] = W[l].T @ dz[l]

Can someone shed some light on this please?

Kic · September 19, 2025, 10:03am

Hi @khteh ,

This is based on formula (10) on section 6.1 Linear Backward.

khteh · September 19, 2025, 10:13am

What’s the source of your reference? The lecture notes .pdf doesn’t have this formula (10) and section 6.1 on it…

Kic · September 19, 2025, 10:36am

Hi @khteh ,

You should find that formula in both graded lab assignments for week 4.

You should find the hand written formula under Backward propagation for layer l
from the lecture notes.

paulinpaloalto · September 19, 2025, 2:57pm

That formula is the key to how back prop actually works, because that is the output from the calculation at layer l that feeds into and drives the computation at layer l - 1. There are three key outputs from the back prop at layer l:

dW^{[l]} and db^{[l]} which we use to perform that actual parameter update at layer l which is the goal of back propagation.

dA^{[l-1]} which passes the later gradients back to the previous layers one step (layer) at a time. That is the key point at which the Chain Rule is applied and makes the whole process work. It’s where the actual “propagation” happens, right? But we’re going backward instead of forward.

If you want to know where it comes from, that is not really covered in the lectures, because Professor Ng has designed these courses not to require knowledge of matrix calculus. But it’s not that hard to see where it arises. The point is that we are using the Chain Rule to compute the derivatives of the final cost J w.r.t. each parameter at each layer. The key point in forward propagation where the output of the previous layer feeds into the input of the next layer is this:

A^{[l]} = g^{[l]}(Z^{[l]})
Z^{[l]} = W^{[l]} \cdot A^{[l-1]} + b^{[l]}

If you take the derivatives there to get the Chain Rule factors at that layer, you end up with:

dA^{[l-1]} = W^{[l]T} \cdot dZ^{[l]}

As mentioned above, this derivation is not covered here. Here is a thread with links both to background information about matrix calculus and the actual derivations of back propagation.

Topic		Replies	Views
Week 4 backward propagation da[l-1] derivation Neural Networks and Deep Learning coursera-platform	2	846	July 24, 2021
W4_A1_Video Lecture on Forward & Backward functions Neural Networks and Deep Learning coursera-platform	4	554	January 15, 2023
The intuition of db^[l]=dz^[l] and da^[l-1]=w^[l-1].dz^[l] Neural Networks and Deep Learning coursera-platform	4	811	May 27, 2023
Neural networks Week4 Backprop da[l-1] proof AI Discussions ai-discussions	2	146	April 21, 2024
Backprop derivative after output later - Course 1: Week 3 Neural Networks and Deep Learning coursera-platform	4	622	June 22, 2024

C1_W4: Confused about da[l-1]

Related topics