Week 3 - Exercise 5 - compute_cost

This function fails because of a difference of

Expected cost = 0.6930587610394646

Actual calculated cost = 0.6926858869721941

There is a 0.0003728740672705 difference

GRADED FUNCTION: compute_cost

def compute_cost(A2, Y):
Computes the cross-entropy cost given in equation (13)

A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)

cost -- cross-entropy cost given equation (13)

m = Y.shape[1] # number of examples

# Compute the cross-entropy cost
# (β‰ˆ 2 lines of code)
# logprobs = ...
# cost = ...
logprobs = np.multiply(np.log(A2), Y)
cost = - np.sum(logprobs)
cost = float(np.squeeze(cost))  # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
return cost

Error: Wrong output
1 Tests passed
1 Tests failed

AssertionError Traceback (most recent call last)
3 print("cost = " + str(compute_cost(A2, t_Y)))
----> 5 compute_cost_test(compute_cost)

~/work/release/W3A1/public_tests.py in compute_cost_test(target)
152 ]
β†’ 154 single_test(test_cases, target)
156 def backward_propagation_test(target):

~/work/release/W3A1/test_utils.py in single_test(test_cases, target)
119 print(’\033[92m’, success," Tests passed")
120 print(’\033[91m’, len(test_cases) - success, " Tests failed")
β†’ 121 raise AssertionError(β€œNot all tests were passed for {}. Check your equations and avoid using global variables inside the function.”.format(target.name))
123 def multiple_test(test_cases, target):

AssertionError: Not all tests were passed for compute_cost. Check your equations and avoid using global variables inside the function.

Expected output

cost = 0.6930587610394646

1 Like

An error in the 3rd decimal place is not a rounding error. You have just used the sample code that they gave you. That is not a complete solution, which they actually told you in the instructions although perhaps their wording was a bit too subtle.

Compare your implementation to the mathematical formula for the cost shown in the instructions (equation 13) and ask yourself two questions:

  1. What happened to the factor of 1/m?

  2. Why is there only one term? What happened to the Y = 0 term?


When I put in 1/m I get cost = 0.23089529565739805 which is way off. The instructions is states

Instructions :

  • There are many ways to implement the cross-entropy loss. This is one way to implement one part of the equation without for loops: βˆ’βˆ‘π‘–=1π‘šπ‘¦(𝑖)log(π‘Ž2)βˆ’βˆ‘i=1my(i)log⁑(a2):
logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs)          
  • Use that to build the whole expression of the cost function.

I got it, I misunderstood the instructions.

I ran into the same problem and frustration. To clarify for future learners, the instructions provide examples of np.multiply(…) and np.sum(…) and tell us to: β€œUse that to build the whole expression of the cost function.”

I, and others, interpreted that as a direction rather than a suggestion. Also, in this practice dataset, using only the code provided in the instructions yields a result that is very similar to the correct output (as above). What the instructions mean to say (and could, perhaps be modified to more clearly do so) is that the whole cost function is described in Equation 13 (as in the lectures) and np.multiply() is a convenient way to assemble each piece of it.

Don’t be misled into thinking that the code they provide here is the entirety of the appropriate calculation.


Saved me man! thank you!

logprobs = np.multiply(np.log(A2),Y)

this code in example is to comupute - \sum\limits_{i=1}^{m} y^{(i)}\log(a^{[2](i)})
However, Cost Function in this exercise is J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}
And we miss -1/m as well

Yes, that is not the whole answer, just part of it. I addressed this point in my earlier reply on this thread.

1 Like

The point of that code sample is not to give you the complete implementation, but to show you one set of numpy operations that will help you build the solution in a vectorized way.

After seeing this issue something came to my mind and I don’t have an answer yet…

How come the expected cost can be the same for everyone (0.6930587610394646) if we use a set of random W parameters that can vary among all the different users?
I would have thought that, given W parameters are different (random), then the A2 will be different as well among users, and thus the initial cost calculation will be also different. How can it match a predefined value?

Thank you!

It is an interesting question. You should have a look at the function compute_cost_test_case, which is in the file testCases_v2.py. What you will find is that they do the same trick that they do everywhere here to get consistent results when random number generation functions are involved: they set the β€œrandom seed” value before the call to the PRNG function. That guarantees that you get consistent results every time. You can read the documentation by googling β€œnumpy random seed”.

Of course you would not do this in a real application, because the whole point is that you actually do want the values to be random. But for their purposes here, it makes things a lot easier to implement the grader and the test cases in the notebooks.

Hi Paul!
Thank you so much for your explanation!
It makes a LOT more sense now :slight_smile:

This translates to
including logprobs as the 1st part of the cost function are are used to. logprobs is an alternative way of calculating the first part of the cost function