What's the intuition of defining Weight matrix with features as column vector in the numpy implementation?

I am currently going through the Week 1 material of the 2nd course in the Machine Learning Specialization, and this is the first time I have come across W matrix defined with feature params (w1_1, w1_2, w1_3..) as column vector.

For example, in the CoffeeRoastingNumPy Optional Lab, W was defined as :
W = np.array( [[-8.93, 0.29, 12.9 ], [-0.1, -7.32, 10.81]] )
and, units = W.shape[1]

What’s the intuition for this ?

Instead, why not follow the same structure as in Course 1 ? Like :
W1 = np.array( [[-8.93, -0.1], [0.29, -7.32], [12.9, 10.81]] )
and, units = W.shape[0]

In this example, Training Set X is set as :
X = np.array([
[200,13.9],
[200,17]])

FYI : I tried out a row-vector implementation for W, and it gave same results. So I just want to understand if there is a good reason for using one approach vs the other.

Follow-up [Minor] :
Need help with an error as I am new python numpy and tensorflow libraries.
When I run the same code module as the Optional Lab on my personal laptop, I get a " DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated," (screenshot attached). Anything I found online didn’t make a lot of sense to me, so I would appreciate some help here ?

Thanks a lot !! Cheers

Hi @ismareth

For the matrix shape: using weights W with shape (num_outputs, num_inputs) (i.e., features as columns) aligns with the math for matrix multiplication in vectorized implementations Z = W @ X + b which is more efficient and standard in many ML frameworks. Both layouts work, but this one often simplifies broadcasting and batch operations.

As for the deprecation warning: you’re likely trying to assign a NumPy array to a scalar (p[i,0] = ...) without using .item()—just replace p[i,0] = my_sequential(...) with p[i,0] = my_sequential(...).item() to fix it.

Hope it helps! Feel free to ask if you need further assistance.

There is no universal standard in the ML industry for the orientation of either the X or W matrices. You’ll find both possible orientations used in equal measure.

1 Like

Thank you for your response.
Your solution to the deprecation warning worked for me as well.

However, I didn’t quite understand your explanation for using columns for features about “simplifies broadcasting and batch operations”. I hope it becomes clearer in next week’s lectures.

You’re welcome! Yes, but think of it this way: organizing weights as (outputs × inputs) lets you process batches where each input is a column and enables a clean Z = W @ X + b operation. It’s mainly about efficient computation across batches.