Hi mentors.
I am taking a big step back at the moment to revise and improve my understanding, as opposed to just obtaining results.
Can you clear up something I didn’t take the time to understand properly right at the start.
Given an input matrix X[n,m].
is there a J for every row [n] added across the [m] examples of x(i).
My logic says that J must be a [n,2] vector but I haven’t seen this defined anywhere.; I would expect there to be ‘n’ number of j (small case) sums. with j[0] = sum(dW) and j[1] = sum(db)
Regards
Ian
There is a loss value for each sample. That’s each column of X. That is a scalar value for each sample. Then we define the cost J as the average of the loss values across all m samples.
Yes, I should have written ‘average’ I understood that.
The implication of what you say is that J (capital) is a single number at each iteration of the calculation?
To express it in math formulas, we have the loss first:
L(\hat{y}_i, y_i) = - y_i * log(\hat{y}_i) - (1 - y_i) * log(1 - \hat{y}_i)
Then the cost is defined as the average of the loss values across the samples:
J = \displaystyle \frac {1}{m} \sum_{i = 1}^{m} L(\hat{y}_i, y_i)
Yes, J is a scalar value at every iteration.
I need to research that a little further - my matrix math is rusty.
I defer to you sensei
Ian
Thanks Paul. I have got my head around that now. Understanding that explains a lot that I was unsure of further into the second course .
Regards
Ian