Week3, Programming assignment: 4.4 - Compute the Cost

I am working on the 4.4 - Compute the cost assignment and I am getting a validation error for the 2nd assertion:

Which means that the formula works well at least for the 1st assertion.

As the formula seems correct to me, and because there is really not much that can be tweaked in this one line, could anybody give me a pointer on what might be wrong with it?

Thanks for any help
Jonathan

hi @Jonathanlugo , there is nothing wrong with what you are doing (please remove the code snippit as it is against the code of conduct to publish it). But as you can see from the instructions above the exercise, you have only implemented one part of the function, the part for which the example is given. There is a second part in the formula that you need to implement. Hope this helps you find the solution.

2 Likes

Thanks for the pointer! I rechecked the formula and found the “missing” second part. Now the formula is 100% represented in python.

However, I am still stuck and I will try to describe the problem without posting code:

According to the formula, there are two parts:

  1. I am using the dot product in both parts. This reduces the matrix to 1 scalar per example (loss).
  2. I am transposing Y so that the dimensions align.
  3. I am using the A2 vector in both parts. The A2 vector is the activation scalar of the 2nd layer.
  4. I am adding the result (loss) of this sum for EACH example (via np.sum to the cost).
  5. I am dividing the result by the number of examples (m). This gives the avg. cost of the batch.

As mentioned before, 1 validation is passing and the next is not.

I checked the formula symbol by symbol and it seems 100% correct.

Thank you for any help
Jonathan

Everything you describe sounds correct. Well, if you used dot product for the cost, it shouldn’t be necessary to use np.sum, but it shouldn’t do any real harm. The point of a dot product is that it performs the “elementwise” product of the two vectors and then adds those up to produce a scalar (well, a 1 x 1 array).

Please show us that actual output you get from running the test. It’s fine to show that and it should shed some light on why the test is failing. Note that it should say something about whether it’s the data type, the “shape” or the actual value that it is not accepting.

Update: I took a look at the test code for compute_cost and the first test is just that your value has the datatype “scalar float”. The second test is verifying the actual value. So just passing the first test doesn’t really tell you much. :scream_cat:

2 Likes

Hi Paul, thanks for replying.

I am attaching the output, as requested. Just fyi, I passed all other tests, got graded and all. So I am not “blocked”, which is great. However, there must be something wrong with either my formula or with the grader. For me, I just would like to understand this part of the material.

Here is the output:

.0796608964759784 3
2.0796608964759784 3
cost = 2.0796608964759784
3.6717539072244776 5
3.6717539072244776 5
Error: Wrong output
1 Tests passed
1 Tests failed

AssertionError Traceback (most recent call last)
in
3 print("cost = " + str(compute_cost(A2, t_Y)))
4
----> 5 compute_cost_test(compute_cost)

~/work/release/W3A1/public_tests.py in compute_cost_test(target)
140 ]
141
→ 142 single_test(test_cases, target)
143
144 def backward_propagation_test(target):

~/work/release/W3A1/test_utils.py in single_test(test_cases, target)
119 print(’\033[92m’, success," Tests passed")
120 print(’\033[91m’, len(test_cases) - success, " Tests failed")
→ 121 raise AssertionError(“Not all tests were passed for {}. Check your equations and avoid using global variables inside the function.”.format(target.name))
122
123 def multiple_test(test_cases, target):

AssertionError: Not all tests were passed for compute_cost. Check your equations and avoid using global variables inside the function.

Expected output

cost = 0.6930587610394646

Thanks again and have a good start of the week
Jonathan

Ah, ok, it helps to see the numbers. Now that I read your verbal description of your code again, I think I know what the problem is. It worries me that you say:

But both Y and A2 are row vectors, right? Meaning they are both 1 x m, where m is the number of samples. If you transpose Y, then it becomes m x 1, so you end up with the dot product being m x 1 dot 1 x m which gives an m x m output. That’s why you needed the np.sum, even though you are using the dot product. On the other hand if you transpose A2 instead, then your dot product will be 1 x m dot m x 1 which gives you a 1 x 1 output.

Notice that these operations are not commutative. Let’s construct a simple example to show what I mean:

>>> v = np.array([[1,2,3,4]])
>>> v.shape
(1, 4)
>>> w = v.T
>>> w.shape
(4, 1)
>>> w
array([[1],
       [2],
       [3],
       [4]])
>>> np.dot(v, v.T)
array([[30]])
>>> np.dot(v.T, v)
array([[ 1,  2,  3,  4],
       [ 2,  4,  6,  8],
       [ 3,  6,  9, 12],
       [ 4,  8, 12, 16]])
>>> np.sum(np.dot(v.T, v))
100
>>>

So you can see that doing it the way you did gives a completely different answer that has no relationship to the correct answer. If you check the v \cdot v^T case, you’ll see it’s the sum of the squares of the elements of v:

1 + 4 + 9 + 16 = 30

You should also recognize the result you get in the other case: it’s the multiplication table for the numbers 1 to 4, right? Adding that up gives a completely different value.

I tried making the mistake that I am theorizing you made and I get the exact same cost value you show:

cost = 2.0796608964759784
Error: Wrong output
 1  Tests passed
 1  Tests failed
10 Likes

You are absolutely right and I should have known that :man_facepalming:. Actually, somewhere in the process I must have (wrongly) transposed Y to get the incorrect m x m matrix shape. Now I have the correct shape and my only problem left seems to be the average cost of the losses. This should be easy to resolve.

Thanks a lot Paul for your help and until next time :+1:t3:

Jonathan

1 Like

Sounds good! But I don’t know what you mean about the average of the losses being a problem. Now that you have the dot products done correctly, you get two 1 x 1 arrays (one for the Y = 1 term and one for the Y = 0 term). You add those together and then multiply by \frac {1}{m} to compute the average, because there is one loss term for every sample (and there are m samples).

Well, as you say and as per the formula I am multiplying 1/m times the dot products of the sum of the two terms (when Y=1 and when Y=0) to get the average cost, and I am still getting a failed test (I added some print statements for clarity):

=====
log(A2.T)
[[-0.69268589]
 [-0.6934306 ]
 [-0.69266804]]
1 - A2.T:
[[-0.69360869]
 [-0.69286384]
 [-0.69362656]]
Y:
[[ True False False]]
A2:
[[0.5002307  0.49985831 0.50023963]]
cost: [[-0.69305876]], m: 3
=====
log(A2.T)
[[-0.69268589]
 [-0.6934306 ]
 [-0.69266804]]
1 - A2.T:
[[-0.69360869]
 [-0.69286384]
 [-0.69362656]]
Y:
[[ True False False]]
A2:
[[0.5002307  0.49985831 0.50023963]]
cost: [[-0.69305876]], m: 3
cost = -0.6930587610394646
=====
log(A2.T)
[[-0.69268589]
 [-0.6934306 ]
 [-0.69266804]
 [-1.38629436]
 [-0.35667494]]
1 - A2.T:
[[-0.69360869]
 [-0.69286384]
 [-0.69362656]
 [-0.28768207]
 [-1.2039728 ]]
Y:
[[ True False False False  True]]
A2:
[[0.5002307  0.49985831 0.50023963 0.25       0.7       ]]
cost: [[-0.54470666]], m: 5
=====
log(A2.T)
[[-0.69268589]
 [-0.6934306 ]
 [-0.69266804]
 [-1.38629436]
 [-0.35667494]]
1 - A2.T:
[[-0.69360869]
 [-0.69286384]
 [-0.69362656]
 [-0.28768207]
 [-1.2039728 ]]
Y:
[[ True False False False  True]]
A2:
[[0.5002307  0.49985831 0.50023963 0.25       0.7       ]]
cost: [[-0.54470666]], m: 5
Error: Wrong output
 1  Tests passed
 1  Tests failed

So I think I am making a mistake somewhere in averaging the cost. The formula is straightforward so I may have missed a detail here.

Notice that your cost values are negative. The log of a number between 0 and 1 is negative. Note that the formula for log loss in the Y = 1 case is -y * log(\hat{y}) for a single sample, right? Take another look at the math formula: they factor out the -1 just to make it look simpler.

I added similar instrumentation and here are my values:

m = 3
Y = [[ True False False]]
A2 = [[0.5002307  0.49985831 0.50023963]]
log(A2.T) = [[-0.69268589]
 [-0.6934306 ]
 [-0.69266804]]
log(1 - A2.T) = [[-0.69360869]
 [-0.69286384]
 [-0.69362656]]
1 Like

Hi ! I got stuck at this step too. I got cost of 0.6928… and expected value is 0.6930…At first I thought it is random number or something, since the value was so close.

After reading through the discussion, I got some clue and found the mistakes. It was by coincidence that the wrong value was so close to the real one. I was searching in the wrong place initially.

Thank you !

1 Like

Hi Paul, thanks for your time and support, I found the error and the code is now 100% clean.

Thank you very much
Jonathan

1 Like

Thks a lot for the examples, they clarify how np.dot works with matrices.
I did transpose A2 instead of Y following the cost function path formula and it works as You suggest…
I would have too many questions now… but:

cost = -np.mean(Y * np.log(A2) + (1 - Y) * np.log(1 - A2))
This works even better without worrying about matrices’ shape and saving code
But why theoretically, the mean of the product of two matrices works as the (product + sum)?
Hope that is clear!

Because the definition of “mean” is the average, right? That’s the sum of the elements divided by the number of them. And the individual elements you are averaging are the product of two numbers.

So you can do that in one step with a dot product (well, you still have to divide by m separately) or you can do a product followed by a sum. You get the same answer either way.

The fundamental point here is that you need to understand the meaning of the mathematical formula. Then you need to translate that into code. There can be multiple correct methods to code a particular mathematical expression.