Why is the Weight Matrix the transposed of NN's?

Z calculation for neural network is:
Screen Shot 2021-06-06 at 12.39.43 PM

Please note that here W is NOT transposed

However, for the logistic regression it is transposed

What’s the reason for flipping the weight matrix?

1 Like

We apply the dot product between W and X to form the equation w1x1 + w2x2 + ... + wnxn for each x example. The dot product requires that number of columns in first matrix should match the number of rows in second, A (p, q) . B (q, r).

In logistic regression initializations, W and X have same orientation as there’s a one-to-one relation. Therefore, we apply transpose operation on the matrix to make them dot-product compatible such that the dot-product results in a single value for each x example. This value for each example is its y value.

In neural network, the matrix orientations are different. Dimensions of first layer’s W is
number of units in first layer X number of input features. That means, each row corresponds to one activation which has weights for each of the input features - a many-to-many relation. The dimensions of the X is number of features X number of examples. Since the matrices are already dot-product compatible, there’s no need to transpose either matrix. Similarly in second, third and deeper layers, the dimensions of W are number of units in the layer X number of units in previous layer which makes them compatible with previous layer’s A matrices for dot-product. The final layer has units number of output classes X number of units in previous layer which maps the calculated value to output classes.


Thanks for your answer. My question is why are the orientations different? Seems like a strange convention.