W4_A1_Inconsistent cost function notation in formula 8 and 9

rmwkwok · January 18, 2023, 2:47am

First, thank you Paul for your clarification. @WinniePooh, I have to withdraw the suggestion that I have filed because those symbols are correct.

Here is my version of explanation, and we need to clearly state all the shapes to see the reason behind:

Let me know if you disagree with / have questions about any of the above.

Note that there are two types of matrices:

matrix for training parameters (those that doesn’t have m in their shapes)
matrix for samples (those that have m in their shapes)

Our ultimate goal is to calculate (1), so let’s focus on (1) first. Each element in them is the gradient with respect to a weight, and that gradient is a summation of influences by all samples (that’s why the m disappeared because it has been summed over). Therefore, matrices of type 1 contain the cost gradients. Note that cost is the sum of losses and loss is for describing one sample.

Now, we look at (2). (2) has m in their shapes meaning that they are per-sample estimates. Take dZ^{[l]} as an example, it has, for each sample out of all m samples, n^{[l]} results because there are such number of neurons in layer l. Since matrices of type 2 are sample-based, each element in those matrices are only loss gradients. Note again that loss is for describing one sample.

I will summarize the above with the following two equations, highlighting the m-\mathcal{L} relation.

@WinnePooh, I am sorry if my previous reply has misled you. @paulinpaloalto, thank you again!

Cheers,
Raymond

Topic		Replies	Views
Week 3,4: Why isn't 1/m part of dz^[L]? Neural Networks and Deep Learning coursera-platform	19	1356	December 6, 2022
DERIVATIVES OR J or L? Neural Networks and Deep Learning week-module-4 , coursera-platform	7	27	February 22, 2025
Week 3: wrong formula for the derivatives dZ[2] in videos and notebook Neural Networks and Deep Learning coursera-platform	5	813	February 2, 2026
Backpropagation formulas Neural Networks and Deep Learning coursera-platform	7	1097	April 21, 2021
Typo in back prop formula (week3 and week 4) Neural Networks and Deep Learning coursera-platform	7	759	December 10, 2021

W4_A1_Inconsistent cost function notation in formula 8 and 9

Related topics