While the grader lets you “pass” the tests for this function when you use np.sum in the cost function, to match the expected output, you have to use np.trace instead (which roughly halves the cost). What is the advantage of using the trace function (beyond decreasing the cost value) and can you give any intuition about why summing the entries along the diagonal of the matrix is just as valid as summing all the entries in the matrix?
If you had to use np.trace, I think that indicates there is something wrong in your code. For starters, if you are using a dot product to compute the cost, the result should be 1 x 1, not n x n for some value of n > 1, right? And if you compute the cost using the dot product correctly, no sum is required. The point of using the dot product is that it combines both operations (product and sum) into one operation.
We’re not supposed to write the code for you, so let’s talk about some general examples. Suppose I have a vector v that is 1 x 4 and I want to compute the sum of the squares of the elements of v. I can think of several ways to do that using numpy functions. Try running the following and watch what happens:
v = np.array([[1, 2, 3, 4]])
print(v.shape)
print(np.sum(v * v))
print(np.sum(v ** 2))
print(np.sum(np.power(v, 2)))
print(np.dot(v, v.T))
print(np.dot(v.T, v))
print(np.sum(np.dot(v.T,v)))
print(np.trace(np.dot(v.T,v)))
Here’s what that generates:
(1, 4)
30
30
30
[[30]]
[[ 1 2 3 4]
[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]]
100
30
So you can get the right answer using np.trace, but only because you are compensating for a previous error.
The trick is to do the “dimensional analysis” on the dot product. In the example I gave, v is 1 x 4, so v.T (the transpose) is 4 x 1. If you do the dot product this way:
np.dot(v, v.T)
Then you are dotting 1 x 4 dot 4 x 1 and the result is 1 x 1.
If you do the dot product this way:
np.dot(v.T, v)
Then that’s 4 x 1 dot 1 x 4 which gives a 4 x 4 result. Examining the output shown in the previous post, the results of that second dot product should look familiar: it’s just the multiplication table of the numbers 1 through 4. That’s why taking the trace happens to give you the sum of the squares. But you are going to way more trouble than you need to in order to do that. Just use the first version of the dot product and you get what you need in one shot: the sum of the products of the corresponding elements of the two vectors.
Thank you so much! That really helped clear things up!