def compute_cost(A2, Y):
“”"
Computes the cross-entropy cost given in equation (13)
Arguments:
A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
Y -- "true" labels vector of shape (1, number of examples)
Returns:
cost -- cross-entropy cost given equation (13)
"""
m = Y.shape[1] # number of examples
# Compute the cross-entropy cost
# (≈ 2 lines of code)
# logprobs = ...
# cost = ...
# YOUR CODE STARTS HERE
logprobs = np.multiply(np.log(A2), Y)
cost = - np.sum(logprobs)
# YOUR CODE ENDS HERE
cost = float(np.squeeze(cost)) # makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
return cost
~/work/release/W3A1/test_utils.py in single_test(test_cases, target)
119 print(’\033[92m’, success," Tests passed")
120 print(’\033[91m’, len(test_cases) - success, " Tests failed")
→ 121 raise AssertionError(“Not all tests were passed for {}. Check your equations and avoid using global variables inside the function.”.format(target.name))
122
123 def multiple_test(test_cases, target):
AssertionError: Not all tests were passed for compute_cost. Check your equations and avoid using global variables inside the function.
An error in the 3rd decimal place is not a rounding error. You have just used the sample code that they gave you. That is not a complete solution, which they actually told you in the instructions although perhaps their wording was a bit too subtle.
Compare your implementation to the mathematical formula for the cost shown in the instructions (equation 13) and ask yourself two questions:
What happened to the factor of 1/m?
Why is there only one term? What happened to the Y = 0 term?
When I put in 1/m I get cost = 0.23089529565739805 which is way off. The instructions is states
Instructions :
There are many ways to implement the cross-entropy loss. This is one way to implement one part of the equation without for loops: −∑𝑖=1𝑚𝑦(𝑖)log(𝑎2)−∑i=1my(i)log(a2):
I ran into the same problem and frustration. To clarify for future learners, the instructions provide examples of np.multiply(…) and np.sum(…) and tell us to: “Use that to build the whole expression of the cost function.”
I, and others, interpreted that as a direction rather than a suggestion. Also, in this practice dataset, using only the code provided in the instructions yields a result that is very similar to the correct output (as above). What the instructions mean to say (and could, perhaps be modified to more clearly do so) is that the whole cost function is described in Equation 13 (as in the lectures) and np.multiply() is a convenient way to assemble each piece of it.
Don’t be misled into thinking that the code they provide here is the entirety of the appropriate calculation.
this code in example is to comupute - \sum\limits_{i=1}^{m} y^{(i)}\log(a^{[2](i)})
However, Cost Function in this exercise is J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}
And we miss -1/m as well
The point of that code sample is not to give you the complete implementation, but to show you one set of numpy operations that will help you build the solution in a vectorized way.
Hi!
After seeing this issue something came to my mind and I don’t have an answer yet…
How come the expected cost can be the same for everyone (0.6930587610394646) if we use a set of random W parameters that can vary among all the different users?
I would have thought that, given W parameters are different (random), then the A2 will be different as well among users, and thus the initial cost calculation will be also different. How can it match a predefined value?
It is an interesting question. You should have a look at the function compute_cost_test_case, which is in the file testCases_v2.py. What you will find is that they do the same trick that they do everywhere here to get consistent results when random number generation functions are involved: they set the “random seed” value before the call to the PRNG function. That guarantees that you get consistent results every time. You can read the documentation by googling “numpy random seed”.
Of course you would not do this in a real application, because the whole point is that you actually do want the values to be random. But for their purposes here, it makes things a lot easier to implement the grader and the test cases in the notebooks.
This translates to
including logprobs as the 1st part of the cost function are are used to. logprobs is an alternative way of calculating the first part of the cost function