W4_A1_Inconsistent cost function notation in formula 8 and 9

Hello @WinniePooh, and @paulinpaloalto,

First, thank you Paul for your clarification. @WinniePooh, I have to withdraw the suggestion that I have filed because those symbols are correct.

Here is my version of explanation, and we need to clearly state all the shapes to see the reason behind:

image
image

Let me know if you disagree with / have questions about any of the above.

Note that there are two types of matrices:

  1. matrix for training parameters (those that doesn’t have m in their shapes)
  2. matrix for samples (those that have m in their shapes)

Our ultimate goal is to calculate (1), so let’s focus on (1) first. Each element in them is the gradient with respect to a weight, and that gradient is a summation of influences by all samples (that’s why the m disappeared because it has been summed over). Therefore, matrices of type 1 contain the cost gradients. Note that cost is the sum of losses and loss is for describing one sample.

Now, we look at (2). (2) has m in their shapes meaning that they are per-sample estimates. Take dZ^{[l]} as an example, it has, for each sample out of all m samples, n^{[l]} results because there are such number of neurons in layer l. Since matrices of type 2 are sample-based, each element in those matrices are only loss gradients. Note again that loss is for describing one sample.

I will summarize the above with the following two equations, highlighting the m-\mathcal{L} relation.

@WinnePooh, I am sorry if my previous reply has misled you. @paulinpaloalto, thank you again!

Cheers,
Raymond