dA derivation; where does the 1/m term go?

marieak · January 1, 2025, 3:54pm

Nevermnd · January 1, 2025, 4:08pm

@marieak, @paulinpaloalto is better at explaining this, but a really easy (at least in my mind) way to think about it is if you seen 1/m or 1/n (or something similar) in front of a sum-- This basically reads as an ‘average’ or ‘mean’. In a sense, it is a scalar.

The average or mean has no ‘rate’ or change. It is kind of just… fixed.

I am not sure though exactly what course you are referencing here to provide more detail.

marieak · January 1, 2025, 4:52pm

Thanks so much for your prompt reply, and happy new year!!

I’m referring to the vectorization of back-propagation.

While I follow this:

I struggle to understand how we discard the 1/m term in the expression for dZ = A - Y in the screenshot below
(with the additional step of dA = (A - Y) / ( A (1 - A) ) not shown)

If we are using the definition of J averaged across 1/m, as provided in my initial post, we should have dZ = (1/m)( A - Y) , no?

Especially that the (1/m) term is later re-introduced when we compute dW and db

Mushi · January 1, 2025, 4:52pm

The ‘1/m’ term is only relevant when you’re computing the gradient of the overall cost function ( J ). When you compute the derivative with respect to the individual loss ( L ), the ( 1/m ) is excluded because you’re not averaging over
( m ) samples at this point, you’re simply analyzing the contribution of a single training example.

gent.spah · January 1, 2025, 4:57pm

Again, as @Mushi says, in your first screenshot you are referring to the computation of derivative per example. Indeed when you average the derivatives you need the 1/m (number of examples)

marieak · January 1, 2025, 5:51pm

Okay, got it, thank you!!

I think this is where my confusion stemmed from:

Cheers

paulinpaloalto · January 1, 2025, 6:02pm

@mushi has given the complete and precise answer above, but here’s another past thread that discusses the same point. The issue is that Prof Ng’s notation is ambiguous: you have to pay attention to whether the dSomething value is a derivative of L or J.

Topic		Replies	Views
Week 3 - Backpropagation Intuition - gradient descent Neural Networks and Deep Learning	1	498	July 18, 2022
Where does the (1/m) come from mathematically? Neural Networks and Deep Learning	6	461	July 25, 2023
Derivation of formula for dZ[2] Neural Networks and Deep Learning	2	591	May 19, 2023
Why dA[L] has no 1/m in week 4? Neural Networks and Deep Learning	5	541	July 19, 2021
Question about derivative formula Neural Networks and Deep Learning week-4	3	14	September 22, 2024

dA derivation; where does the 1/m term go?

Related topics