Week 2 Programming Assignment Exercise 5 - cost function

No, Prof Ng really doesn’t talk much about how to actually code things in the lectures. He’s talking about what the computations actually do. It’s only in the assignments that they discuss the coding. As in all cases, you start with the mathematical formulas and then the question is how to express that in code. So the first step is to understand what the math actually means.

In this case, you’ve got two vectors Y and A which are both 1 x m, where m is the number of samples. Then what you need to compute is this formula, which is the average of the loss values across all the input samples:

J = -\displaystyle \frac {1}{m} \sum_{i = 1}^{m} [y_i * log(a_i) + (1 - y_i) * log(1 - a_i)]

Or you could write that as:

J = -\displaystyle \frac {1}{m} \left ( \sum_{i = 1}^{m} y_i * log(a_i) + \sum_{i = 1}^{m}(1 - y_i) * log(1 - a_i)\right )

As usual, there are lots of ways to write that in python + numpy code. But the fundamental operation in both terms is multiplying together the individual elements of two 1 x m vectors and then adding up those products to compute the sum. I can think of two obvious ways to do that which are “vectorized” (which we want for performance reasons) with numpy:

  1. Use np.multiply or “*” which are two ways to express “elementwise” multiply and then use np.sum to add up the products.

  2. Use np.dot to do both operations in one shot (multiply followed by add). But the trick there is that the rules for dot product are that the “inner” dimensions need to agree. I can’t dot 1 x m with 1 x m, right? But I could dot 1 x m times m x 1 and then I’d get a 1 x 1 or scalar result, which is what I want. So that means I would need to “transpose” the second operand in order to get that to work.

So you pick one of those methods and apply it twice: once for the Y = 1 term and once for the Y = 0 term. For more information about the various numpy functions, just google “numpy sum”, “numpy multiply” and “numpy dot”.

6 Likes