Wk 2 Cost function in propagate uses np.trace, not np.sum--why?

ktkc · April 27, 2021, 9:06pm

While the grader lets you “pass” the tests for this function when you use np.sum in the cost function, to match the expected output, you have to use np.trace instead (which roughly halves the cost). What is the advantage of using the trace function (beyond decreasing the cost value) and can you give any intuition about why summing the entries along the diagonal of the matrix is just as valid as summing all the entries in the matrix?

paulinpaloalto · April 27, 2021, 10:26pm

If you had to use np.trace, I think that indicates there is something wrong in your code. For starters, if you are using a dot product to compute the cost, the result should be 1 x 1, not n x n for some value of n > 1, right? And if you compute the cost using the dot product correctly, no sum is required. The point of using the dot product is that it combines both operations (product and sum) into one operation.

We’re not supposed to write the code for you, so let’s talk about some general examples. Suppose I have a vector v that is 1 x 4 and I want to compute the sum of the squares of the elements of v. I can think of several ways to do that using numpy functions. Try running the following and watch what happens:

v = np.array([[1, 2, 3, 4]])
print(v.shape)
print(np.sum(v * v))
print(np.sum(v ** 2))
print(np.sum(np.power(v, 2)))
print(np.dot(v, v.T))
print(np.dot(v.T, v))
print(np.sum(np.dot(v.T,v)))
print(np.trace(np.dot(v.T,v)))

Here’s what that generates:

(1, 4)
30
30
30
[[30]]
[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]
 [ 4  8 12 16]]
100
30

So you can get the right answer using np.trace, but only because you are compensating for a previous error.

paulinpaloalto · April 27, 2021, 10:36pm

The trick is to do the “dimensional analysis” on the dot product. In the example I gave, v is 1 x 4, so v.T (the transpose) is 4 x 1. If you do the dot product this way:

np.dot(v, v.T)

Then you are dotting 1 x 4 dot 4 x 1 and the result is 1 x 1.

If you do the dot product this way:

np.dot(v.T, v)

Then that’s 4 x 1 dot 1 x 4 which gives a 4 x 4 result. Examining the output shown in the previous post, the results of that second dot product should look familiar: it’s just the multiplication table of the numbers 1 through 4. That’s why taking the trace happens to give you the sum of the squares. But you are going to way more trouble than you need to in order to do that. Just use the first version of the dot product and you get what you need in one shot: the sum of the products of the corresponding elements of the two vectors.

ktkc · May 3, 2021, 10:43pm

Thank you so much! That really helped clear things up!

Topic		Replies	Views
Course1, Week2, programming Exercise 5 - Propagate Neural Networks and Deep Learning	5	639	May 16, 2023
Week 2, Exercise 5, computing cost using np.dot Neural Networks and Deep Learning	29	2389	May 22, 2024
Programming assignment 1 completed - but still a problem w/ np.dot() in propagate Neural Networks and Deep Learning week-2	4	310	February 26, 2024
DLS C1 W2 Exercice 5 Neural Networks and Deep Learning	4	695	August 9, 2022
W2 E5 - dot product vs element-wise multiplication Neural Networks and Deep Learning	4	669	May 26, 2023

Wk 2 Cost function in propagate uses np.trace, not np.sum--why?

Related topics