Is the reason why we transpose a matrix, is such that we orientated for it to be dot product and produce our intended results?

hazingo · December 19, 2022, 12:05am

I’m confused here is why we would want to transpose a matrix, and not orientate it in a shape originally for it to be dot produced in the future.

Is there some sort of structure for example for a training set “X”, the columns represent the input while the rows represent the different training examples therefore 4 by m? Which raises a question for me is why are the columns represent the input and not the row? Is there like intuitive reasoning?

Thank you,.

paulinpaloalto · December 19, 2022, 12:40am

How we define the data is all just choices. There is no intrinsic reason why the samples are the columns of X rather than the rows. It is just a choice the Prof Ng has made. If you took the original Stanford Machine Learning course, he did it differently there. Of course lots of consequences follow from this choice.

People often ask why we have to transpose the weight vector w in Logistic Regression:

z = w^T \cdot x + b

Whereas when we get to full Neural Networks in Week 3, we no longer need to transpose W:

z = W \cdot x + b

The answer is that these are also choices that Prof Ng has made: he uses the convention that any standalone vector is a column vector. That applies to both w the weight vector and x the sample vector, so we need to transpose w in order for the dot product to work.

But when he defines the W matrices for neural networks, he chooses to stack the weights for each neuron as a row of W and then we don’t need the transpose.

hazingo · December 19, 2022, 10:39pm

Thank you, that makes alot of sense now

tbhaxor · August 3, 2024, 10:31pm

It is just used to make the two matrices compatible for multiplication. Suppose we have m records and each of them has n features. The matrix A will be of shape m \times n.

Now, we decide to have p nodes in the first hidden layer. To deeply connect them you need to have n weights for each feature on each node. The first layer can be represented as matrix H of shape p \times n.

As you can see multiplying these matrices are not possible because of shapes. Therefore the only way is transpose one of them. Now this is up to choice which one you choose. Generally we choose weight matrix and keep input matrix unchanged.

Shape of H^T is n \times p and we can now perform multiplication A \cdot H^T yielding the shape of m \times p. As you can see the batch size is unchanged, next hidden layer will get exact amount of records, but now the number of features is changed, matching the number of nodes in the first hidden layer p.

This is how I learnt this rationale. Although it still doesn’t completely make sense to me, I am working for more robust reasoning.

paulinpaloalto · August 4, 2024, 1:42am

That is one way you could choose to arrange the matrix. Note that the point of my previous reply is that is not how Prof Ng arranges the data in DLS C1 and DLS C2 and this question was asked in the category of DLS C1. There he uses the arrangement that the columns of A are the individual sample vectors, so in Prof Ng’s scheme A would be n x m.

TMosh · August 5, 2024, 2:19am

Not necessarily. There is no universal standard for the matrix shapes.

tbhaxor · January 20, 2025, 11:10am

Yes, but at least in pandas we have rows as the each distinct observation and columns as features of each record. Again we have n x m, which is transpose of m x n. I didn’t want to include it because you can’t include a term that you are explaining in the explanation.

TMosh · January 20, 2025, 4:52pm

That appears to be the standard for Pandas. It’s not a universal standard.

Topic		Replies	Views
Why is the Weight Matrix the transposed of NN's? Neural Networks and Deep Learning	2	855	June 16, 2021
C1_General Question_Dimensions of W_ from week 2_to_ week 4 Neural Networks and Deep Learning	3	510	October 28, 2022
Cant understand a matrix Neural Networks and Deep Learning	5	1235	March 8, 2024
Always confusion with the transpose Neural Networks and Deep Learning	4	815	January 9, 2023
Questions of Week 3 Quiz Neural Networks and Deep Learning	10	611	October 28, 2022

Is the reason why we transpose a matrix, is such that we orientated for it to be dot product and produce our intended results?

Related topics