Hi. I don’t know what to do for writing the cost function using numpy. I keep getting an error and I don’t know why. Perhaps I can’t perform np.dot on the given vectors, but the equation does not call for any transposition.
Whether you transpose is an implementation issue. It depends on how the data in the matrices are organized.
What error message do you get when you run your code? You can post your error messages - but don’t post a copy of your entire code.
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
Here’s a thread which discusses how to compute the cost. There are a number of ways to do it.
Here’s another one that starts directly from 1 x 3 dot 1 x 3.
I found a solution using dot product, but I’m not sure why this is the only way I could get it to work. I suppose if I explain it I’d be giving an answer though.
It was necessary to use indices for one dot product I did, but it was not for the second. They both take nested arrays as arguments, so I don’t see why one requires this while the other does not.
Moreover, applying a transposition does not work for one while it does for the other, even though we are multiplying similar dimensions. Is there something about a vector containing only one row or one column that behaves differently?
Did you actually read the threads that I linked? This one explains that point. Note that you need to have a solid understanding of the basics of linear algebra as a prerequisite here.
Oh, sorry, I did not read it because I found a solution myself. I will do so now.
I read it. So, is this a fair way to understand it?: When you take np.dot of two vectors of two dimensions and one of those dimensions is of size 1, at the first level of depth, the arrays contained in the first argument will be spread across the values of the second argument, taking the form of the first argument at the shallower depth, and the form of the second argument at the deeper depth. The shape(1, 4) vector when taken as the first argument is spread across the values of the 4, 1 taking the second argument’s inner shape of size 1 by adding together the numbers of its inner shape having been spread across the 4, 1 and multiplied. The shape(4, 1) vector as the first argument is spread across or stretched through the second argument of 1, 4 at its primary depth (which contains 4 elements) and then these stretched elements take the shape of the second arguments inner size or shape.
Instead of trying to do “artistic interpretations” of what dot products mean, I suggest you just “play out” the dot products. That was the point of that thread I pointed you to. This is all very concrete and practical. You need to understand what “matrix multiply” means at the level of the atomic operations. It’s a very concrete mathematical operation, which you need to understand a priori in order to succeed here.
I best understand how to make practical use of operations when I can explain them. Creating a dialogue in my head that rations the essence of what I am using/doing helps me use it more effectively. By understanding a priori do you mean that you must have the background knowledge of linear algebra to understand, or some abstract form of intelligence that lends itself to understanding this concept that some are born with and some are not?
I’ve been looking into dot products and matrix multiplication, but the vector dot product and the multiplication of matrices seem to require a different set of rules when we tell the computer “hey, this is what np.dot should perform”. I understand that matrix multiplication can be explained as linear transformations which are identical in essence to the linear transformation of a dot product of two vectors(in one case a vector transforms a vector, in another a matrix a matrix, and lastly a matrix can transform a vector and vise verse), but my inkling is that the programming language must have some set of rules for how it should work when dealing with matrices, as opposed to simply two vectors like “if at least one argument is a matrix, perform matrix multiplication. Else compute the dot product”. Otherwise I do not have the a priori knowledge to understand this concept. As in the example given, mx1 on 1xm gives an mxm matrix and conversely the opposite. As I understand it, in order for matrix multiplication to be defined, the number of columns in the first matrix must be equal to the number of rows in the second matrix. Therefore, the dot product of two vectors “operationally” or “at the level of atomic operations” cannot be the same. If this is not the case, how? Where have I gone wrong? Please do not tell me to read through the numpy library. It is does not explain itself in a way that lends to conceptual understanding.
I simply meant that you need to have taken at least high school level Linear Algebra (or the online equivalent thereof) as a pre-requisite here. You don’t need anything as sophisticated as eigenvalues, but you have to understand how algebraic operations on matrices and vectors work. If you understand matrix multiplication, then you should easily be able to see that the dot product of a row vector (one row of the first matrix) and a column vector (one column of the second matrix) is the atomic operation that is the basis of matrix multiplication. Each element of the product matrix is the result of a single vector dot product between a row of the first operand and a column of the second operand. What np.dot
does is fundamentally the same for vectors and matrices, it’s just that it has more work to do in the matrix case. A row vector is a matrix with one row and a column vector is a matrix with one column. If you think of it that way, then it’s clear that the operations are the same.
if you multiply a column vector by a column vector this is not matrix multiplication; this is the dot product. You cannot do matrix multiplication in this case because the number of columns of say a 3x1 column vector does not equal the number of rows of a vector of the same dimensions, or any other column vector. the dot product and matrix multiplication are not necessarily the same thing, I think
You can’t do np.dot
with two column vectors. You also can’t multiply a matrix by another matrix if the number of columns of the first matrix does not match the number of rows of the second operand. Notice what I said about the first operand being a row vector and the second being a column vector with the same number of elements. Now draw yourself two matrices on a piece of paper. Use small integer values. Make the first one 2 x 3 and the second one 3 x 4. Now actually work out the steps of matrix multiplication. Watch what happens at each position of the output. The output will be 2 x 4 of course. Actually do the multiplication and watch what happens. This is what I mean: this is not about feelings or intuitions. It’s a very concrete and particular operation. You are trying to do it all with imagination. It’s not about imagination: it’s about actual real operations. Please do them and watch what happens. Understanding comes from seeing what actually happens.
You actually can do np.dot with column vectors. They just have to be in 1-D arrays not 2-D. To compute the cost function I took the 0th index of the arrays used to access the inner 1-D arrays instead of transposing.
As for matrix multiplication, I admit I had not done any at the beginning of this thread. Sorry for not clarifying that I had actually gone on a matrix calculator website that shows all the steps of multiplication and tried different sized vectors. That helped me out a lot. I should probably do some on paper like you said though.
A 1D vector by definition cannot be a “column vector” or a “row vector”. The whole point of 1D is that they don’t have an orientation. So yes, you can dot two 1D vectors together if they contain the same number of elements, but that does not count as dotting two column vectors.
Numpy.org states:
- If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
Perhaps I am misunderstanding this point. What is the inner product? What is meant by “without complex conjugation?
If this a more general question just about the meaning of that particular statement in the numpy dot documentation, the answer to the question is not really relevant to the course materials, for (at least) two reasons:
- We specifically avoid using 1D vectors.
- The complex conjugation issue comes up only when dealing with complex valued objects, we only deal with real numbers here.
For real vector spaces “inner product” and “dot product” are equivalent. “Inner product” is a generalization of dot product to arbitrary vector spaces which can include cases in which the values can be complex numbers (as in i which is \sqrt{-1}). Here’s a MathExchange article that discusses this and here’s the Wikipedia article on Inner Product Spaces.
But this is really beyond the scope of anything we are doing here …