Ah, ok, it helps to see the numbers. Now that I read your verbal description of your code again, I think I know what the problem is. It worries me that you say:

But both Y and A2 are row vectors, right? Meaning they are both 1 x m, where m is the number of samples. If you transpose Y, then it becomes m x 1, so you end up with the dot product being m x 1 dot 1 x m which gives an m x m output. Thatâ€™s why you needed the `np.sum`

, even though you are using the dot product. On the other hand if you transpose A2 instead, then your dot product will be 1 x m dot m x 1 which gives you a 1 x 1 output.

Notice that these operations are not commutative. Letâ€™s construct a simple example to show what I mean:

```
>>> v = np.array([[1,2,3,4]])
>>> v.shape
(1, 4)
>>> w = v.T
>>> w.shape
(4, 1)
>>> w
array([[1],
[2],
[3],
[4]])
>>> np.dot(v, v.T)
array([[30]])
>>> np.dot(v.T, v)
array([[ 1, 2, 3, 4],
[ 2, 4, 6, 8],
[ 3, 6, 9, 12],
[ 4, 8, 12, 16]])
>>> np.sum(np.dot(v.T, v))
100
>>>
```

So you can see that doing it the way you did gives a completely different answer that has no relationship to the correct answer. If you check the v \cdot v^T case, youâ€™ll see itâ€™s the sum of the squares of the elements of v:

`1 + 4 + 9 + 16 = 30`

You should also recognize the result you get in the other case: itâ€™s the multiplication table for the numbers 1 to 4, right? Adding that up gives a completely different value.

I tried making the mistake that I am theorizing you made and I get the exact same cost value you show:

```
cost = 2.0796608964759784
Error: Wrong output
1 Tests passed
1 Tests failed
```