I’m having a lot of difficulty mentally translating between math/cs/physics notation.

In a strictly math class, [1 2] would be a 1x2 matrix corresponding to the vector components i=1 j=2 for the 2D vector described by its transpose (vertical 1 over 2 — sorry not sure how to write that here).

In the screenshot below, printing the vector describes a 1x2 vector, whereas the actual visual is a vertical 1 over 2… doesn’t that mean that [1 2] should be labeled as a matrix? Why is are the vector and its transpose in the image both being labeled as vectors?

“First, in matrix algebra, there are two shapes of vectors: column form (n x 1) and row form (1 x n).”
(1) I thought only “column vectors” are defined by (mx1) and “row vectors” by (1xn) — not ALL vectors, which can be defined by (m x n) and theoretically can scale to infinite dimensions?

“Python then makes things even more complex by allowing vectors that are 2D but have an undefined dimension, such as (n ,) and (, n).”
(2) How can a 2D vector have an undefined dimension?

(1) (m x n) is a matrix, not a vector. Technically even a vector is a degenerate matrix, where one of the dimensions is null.

(2) It’s a mystery of Python. It allows you to supply the actual value later. Typically in machine learning, it’s the first dimension, which defines the number of examples.

(1) After going through many, many different textbooks, I am starting to believe you. It’s weirdly not stated explicitly. I think my confusion was stemming from how e.g. a 2x2 matrix can be used to describe two 2D vectors, and there are mathematical+physical implications of visualizing the columns as individual vectors. I think I’m on the right track now but will comment if this comes up again. (Redoing all the labs+assignments, I passed but am not sure I understand what I did.)

(2) I’ve got nothin’ for this one but thank you for the response. Do you have any recommended resources to read more about this point?

There’s no universal agreement about the orientation of the matrix of examples or the weight vector.

Consider the following situation:

X is a matrix containing the training examples. Its size is (m x n), where ‘m’ is the number of examples, and ‘n’ is the number of features.

This essentially means that each example is a row vector, that is stored in the X matrix.

Now if ‘w’ is a column vector that holds the weights for each feature, its size can be (n x 1).

So when you want to compute the linear predictions, you have this: f = X * w + b
…where the size of (X * w) is (m x n) * (n x 1), which gives a (m x 1) result. The multiplication is a dot product. The result is a column vector. Then the scalar ‘b’ is added element-wise to each row.

So you have a nice clean computational solution that requires no transpositions.